Part 2 of 3

The Product Data Audit — What We Actually Find

Verbal+Visual · 2026

The goal of a product data overhaul is to get the team back to merchandising instead of data maintenance.

That framing matters because product data projects tend to get scoped as technical initiatives. They get handed to a developer or an agency with a brief that says "clean up our metafields" or "migrate our tags." The work gets done in isolation, the team doesn't fully understand what changed or why, and six months later the same patterns start creeping back in.

A product data audit is an operational efficiency project that happens to involve data architecture. The output is a cleaner, more maintainable system. The outcome is a team that spends its time on merchandising and growth instead of managing field dependencies and spreadsheet imports.

What a product data audit actually covers

The scope varies by store complexity, but the core areas are consistent: metafield architecture review, tag analysis, data ownership mapping, population rate analysis, and feed and channel dependency mapping.

The dependency mapping is often the most consequential step. You cannot safely recommend removing a field without confirming it isn't a feed dependency. We've seen stores where a seemingly unused metafield was the only thing keeping several hundred products visible on Google Shopping. Without that mapping, a well-intentioned cleanup can create problems worse than the ones it was trying to solve.

The patterns that keep showing up

Across multiple audits, the same themes emerge. The specifics vary by store, but the structural patterns are remarkably consistent.

Redundancy and sprawl

The same piece of information stored in two, three, sometimes four places. A product attribute lives as a tag, a product-level metafield, a variant-level metafield, and a Liquid variable computed at render time. Three of the four are out of sync, and nobody is sure which one is the source of truth.

One attribute, four sources — which one is the source of truth?
"Gender"
Tag
fil|gender:mens
Mens
in sync
Product metafield
custom.gender
Men's
drifted
Variant metafield
variant.gender
Male
drifted
Liquid variable
{% assign gender = ... %}
mens
in sync
Four sources, three different values. Nobody knows which one is authoritative.

This is where Part 1's customer-facing symptoms originate. Customers see inconsistent specs, wrong sizing information, and product details that don't match across the site because the data has drifted apart across multiple sources.

Manual processes that don't scale

Spreadsheet-driven imports that require perfect sequencing. Automation tools layered on top of manual processes without replacing them. One team member who knows the import logic and the edge cases.

Archaeology layers

Filter tags from 2019 still driving collection logic. Metafields added in 2022 powering the search tool. A third set of fields feeding the Google Shopping export. Each layer was added to solve a specific problem without retiring the previous solution.

No organizational logic in the data architecture

Everything in the custom.* namespace with no strategy for organizing it. Taxonomy, lifecycle, pricing, media, specs, all in one bucket. No way to look at a field name and know who owns it or whether it's still active.

When nobody can untangle what depends on what, every change carries risk. The team stops cleaning up and starts working around the mess, and the mess grows.

How we think about recommendations

The audit produces a prioritized set of recommendations. The sequencing matters — we always start with what can be safely removed before recommending what to add.

The prioritization framework
P0Fix now — actively causing problems
Active bugs and data sync issues causing visible customer-facing problems. Channel routing errors, collection mismatches, checkout inconsistencies.
e.g. Tag capitalization collision: "Fullprice" vs "fullprice" silently breaking collection rules
P1Quick wins — high impact, low effort
Removing confirmed dead fields, consolidating obvious duplicates, standardizing tag naming. Each takes under 30 minutes. No feed risk.
e.g. Remove 11 orphaned page_builder.* fields from uninstalled app
P2Structural improvements — clear ROI
Tag-to-metafield migrations, namespace reorganization, automation upgrades. Requires coordination but eliminates the root cause of recurring issues.
e.g. Migrate 20 fil|* filter tags to taxonomy.* metafields, retire tag-based collections
P3Future architecture — depends on other decisions
PIM integration, ERP connections, governance models. These become viable after cleanup reduces the surface area.
e.g. Implement Plytix PIM as single source of truth for content, taxonomy, and media

Cleanup reduces the surface area for the structural work that follows. A store with 75 metafields and no documentation is overwhelming. A store with 40 metafields, each with a clear owner and purpose, is a manageable foundation for the next phase.

What the target state looks like

Every field has an owner. You should be able to look at a field's namespace and know whether it's managed by a human, an app, an ERP, or a PIM. One concept lives in one location. Data is typed — ratings should be integers, dates should be dates, controlled vocabularies should be select lists. And governance is a deliverable, not an afterthought.

The before and after

Before
After
75 metafields across 8 namespaces
custom.* — 33 fields (all mixed together)
custom.gendercustom.activitycustom.fitcustom.weathercustom.descriptioncustom.featurescustom.fabriccustom.waterproof_ratingcustom.warmth_ratingcustom.weightcustom.price_lowcustom.lifestyle_imgcustom.title_tag...
stride.* — 4 fields (legacy system)
stride.sizing_teststride.show_newstride.descriptionstride.inventory_old
page_builder.* — 11 fields (app removed)
page_builder.hero_2762GeHpage_builder.titlepage_builder.description...

One engagement took a store from roughly 75 metafields to roughly 40. The team member who previously managed 13 interdependent fields across the product lifecycle now manages 2. Everything else is either automated or governed by a PIM. The team stopped managing data and started shipping product.

Clean product data also has a second benefit that brands often don't anticipate. The same structural improvements that make operations more efficient also make a store's catalog machine-readable in ways that matter increasingly for AI-mediated commerce. In the final post, we'll get into exactly how that connection works and why the timing matters.

Part 1: Holding You BackNext: Your AI Strategy →