A recent experiment with an AI-powered design analysis tool revealed how subtle bugs in color perception, shape interpretation, and typography handling can produce wildly inaccurate results. The developer behind the open-source tool brandmd discovered that its automated assessments of a bright, magazine-style webpage were embarrassingly wrong—categorizing a cream-colored background as "dark and moody" and mislabeling a vivid blue as "dark blue." These errors stemmed from oversimplified heuristics that failed to account for real-world design nuances.
Why AI often misreads design intent
Design systems rely on human perception, which prioritizes dominant visual elements over minor details. However, many AI tools analyze elements in isolation, leading to flawed conclusions. For example, a webpage might appear bright and airy to a human, but an AI could be misled by a small dark footer or a single vivid accent color. These discrepancies highlight the need for more sophisticated analysis methods that mirror human visual processing.
Five critical bugs in design-token tools
1. Incorrect mood assessment from luminance averaging
The tool initially determined a page’s mood by averaging the luminance of every color in the palette. This approach treated a tiny dark footer the same as a dominant cream background, skewing the result. For instance, a webpage with 90% cream and a small dark footer was labeled "dark and moody" simply because the average luminance dipped below a threshold.
The fix: Instead of averaging, anchor the mood assessment to the dominant background color—the one that covers the most viewport area. This method aligns with human perception, where the overall impression is shaped by the largest visual elements rather than minor details.
2. Misleading color names from raw RGB distances
The tool used nearest-neighbor matching in RGB space to assign names to colors. A cream-colored background (#F7F6F5) was incorrectly labeled as "Light Muted Orange" because its slight warm cast was closer in RGB distance to an orange hue than to an off-white. This approach overlooked how humans perceive color in context.
The fix: Switch to HSL-based rules for naming colors. For example:
- Colors with lightness above 90% are classified as off-white or near-white, regardless of hue.
- Highly saturated mid-lightness colors are labeled as "vivid."
- Hue ranges are mapped to descriptive names like "blue" or "green" only when appropriate.
This change ensures that #F7F6F5 is correctly identified as "off-white" and #2200FF as "vivid blue."
3. Tone words derived from luminance instead of lightness
A bright electric blue (#2200FF) was labeled "dark blue" because the tool relied on luminance—a formula weighted heavily toward green and red—to determine tone. Blue’s minimal contribution to luminance (just 7%) meant the color was misclassified as dark, despite appearing vivid and bright to human eyes.
The fix: Use HSL lightness instead of luminance to assign tone words. Lightness treats all color channels equally, ensuring that a mid-lightness blue is correctly described as such, with saturation adding descriptors like "vivid."
4. Overly precise border-radius values distorting shape analysis
When a button used border-radius: 9999px to create a pill shape, the tool returned the raw computed value—3.35544e+07px—as the border radius. This scientific notation not only looked unprofessional in design documentation but also skewed shape analysis, leading the tool to describe a sharp-edged site as "rounded and friendly."
The fix: Normalize extreme border-radius values to a standardized label like 9999px (pill) for display while keeping the CSS valid. Additionally, exclude pill-shaped elements from shape analysis, focusing instead on the largest non-pill radius to determine whether a site is rounded or sharp.
5. Sub-pixel font sizes creating false type scale diversity
A webpage’s type scale appeared to have seven distinct sizes between 11px and 13px (11px, 11.05px, 11.5px, 12px, 12.5px, 12.75px, 13px). In reality, these were rendering artifacts caused by sub-pixel antialiasing and zoom, not intentional design choices.
The fix: Cluster font sizes by rounding them to the nearest 0.5px before ranking and deduplication. This process collapses seven noisy values into the three real design-system steps, providing a clearer picture of the actual type scale.
The bigger lesson: Visual dominance isn’t about element count
While fixing the color analysis, the developer discovered a deeper flaw: element count is a poor proxy for visual dominance. A 6% alpha black overlay might be used across dozens of tiny chips, but it doesn’t register as the primary color to the human eye. The actual page background, which occupies most of the viewport area, was being overshadowed by this minor detail.
The fix: Weight colors by their viewport area share rather than their element count. This adjustment ensures that the most visually dominant colors—like a full-page cream background—are correctly identified as primary, while minor overlays are deprioritized.
Try it yourself
The corrected heuristics are now live in brandmd v0.12. To see the difference, compare the outputs:
npx brandmd@0.11.1 npx brandmd@0.12.0 The full changelog details these changes, and the repository is available for developers looking to integrate accurate design system documentation into their workflows. By addressing these common bugs, AI tools can finally generate design analyses that align with human perception, reducing guesswork in UI development.
AI summary
Discover why AI tools misjudge website aesthetics and learn how to correct common bugs in color naming, mood assessment, and typography analysis for accurate design system documentation.