The Product Data Audit — What We Actually Find
The goal of a product data overhaul is to get the team back to merchandising instead of data maintenance.
That framing matters because product data projects tend to get scoped as technical initiatives. They get handed to a developer or an agency with a brief that says "clean up our metafields" or "migrate our tags." The work gets done in isolation, the team doesn't fully understand what changed or why, and six months later the same patterns start creeping back in.
A product data audit is an operational efficiency project that happens to involve data architecture. The output is a cleaner, more maintainable system. The outcome is a team that spends its time on merchandising and growth instead of managing field dependencies and spreadsheet imports.
What a product data audit actually covers
The scope varies by store complexity, but the core areas are consistent: metafield architecture review, tag analysis, data ownership mapping, population rate analysis, and feed and channel dependency mapping.
The dependency mapping is often the most consequential step. You cannot safely recommend removing a field without confirming it isn't a feed dependency. We've seen stores where a seemingly unused metafield was the only thing keeping several hundred products visible on Google Shopping. Without that mapping, a well-intentioned cleanup can create problems worse than the ones it was trying to solve.
The patterns that keep showing up
Across multiple audits, the same themes emerge. The specifics vary by store, but the structural patterns are remarkably consistent.
Redundancy and sprawl
The same piece of information stored in two, three, sometimes four places. A product attribute lives as a tag, a product-level metafield, a variant-level metafield, and a Liquid variable computed at render time. Three of the four are out of sync, and nobody is sure which one is the source of truth.
This is where Part 1's customer-facing symptoms originate. Customers see inconsistent specs, wrong sizing information, and product details that don't match across the site because the data has drifted apart across multiple sources.
Manual processes that don't scale
Spreadsheet-driven imports that require perfect sequencing. Automation tools layered on top of manual processes without replacing them. One team member who knows the import logic and the edge cases.
Archaeology layers
Filter tags from 2019 still driving collection logic. Metafields added in 2022 powering the search tool. A third set of fields feeding the Google Shopping export. Each layer was added to solve a specific problem without retiring the previous solution.
No organizational logic in the data architecture
Everything in the custom.* namespace with no strategy for organizing it. Taxonomy, lifecycle, pricing, media, specs, all in one bucket. No way to look at a field name and know who owns it or whether it's still active.
How we think about recommendations
The audit produces a prioritized set of recommendations. The sequencing matters — we always start with what can be safely removed before recommending what to add.
Cleanup reduces the surface area for the structural work that follows. A store with 75 metafields and no documentation is overwhelming. A store with 40 metafields, each with a clear owner and purpose, is a manageable foundation for the next phase.
What the target state looks like
Every field has an owner. You should be able to look at a field's namespace and know whether it's managed by a human, an app, an ERP, or a PIM. One concept lives in one location. Data is typed — ratings should be integers, dates should be dates, controlled vocabularies should be select lists. And governance is a deliverable, not an afterthought.
The before and after
One engagement took a store from roughly 75 metafields to roughly 40. The team member who previously managed 13 interdependent fields across the product lifecycle now manages 2. Everything else is either automated or governed by a PIM. The team stopped managing data and started shipping product.
Clean product data also has a second benefit that brands often don't anticipate. The same structural improvements that make operations more efficient also make a store's catalog machine-readable in ways that matter increasingly for AI-mediated commerce. In the final post, we'll get into exactly how that connection works and why the timing matters.