Best machine learning technique for matching product strings

Question

1 Answer

Anurag · Answer 1 · 2019-06-28T05:24:10+0000

The first thing you should do is to parse the names into a description of features (company LG, size 42 Inch, resolution 1080p, type LCD HDTV). Then you can match these descriptions against each other for compatibility; it's okay to omit a product number but bad to have different sizes. Simple are-the-common-attributes-compatible might be enough, or you might have to write/learn rules about how much different attributes are allowed to differ and so on.

Depending on how many various kinds of products you have and how different the listed names are, I might actually start by manually defining a set of attributes and possibly even just adding specific words/regex to match them, iteratively seeing what isn't been parsed so far and adding rules for that. I'd imagine there's not a lot of ambiguity in terms of one vocabulary item possibly belonging to multiple attributes, though without seeing your database I guess I don't know.

If that's not going to be feasible, this extraction is kind of analogous to semi-supervised part-of-speech tagging. It's somewhat different, though, in that I imagine the vocabulary is much more limited than typical parsing, and in that, the space of product names is more hierarchical: the resolution tag only applies to certain kinds of products. I'm not very familiar with that literature; there might be some ideas you could use.

Best machine learning technique for matching product strings

1 Answer

Related questions

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources