Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in AI and Deep Learning by (50.2k points)

Let's say I have filenames that are formatted differently. I want to be able to extract certain aspects from said filename like a human would; pattern recognition.

I can brute-force myself through with regular expressions but that's not what I'm after. Let's say I have these 4 strings:

[MAS] Hayate no Gotoku!! 20 [BD 720p] [21D138F8].mkv

[Leopard-Raws] Akatsuki no Yona - 05 RAW (MX 1280x720 x264 AAC).mp4

[BLAST] Wolf Girl and Black Prince - 05 [720p] [C1252A5E].mkv

[sage]_Mobile_Suit_Gundam_AGE_-_36_[720p][10bit][45C9E0D0].mkv

As you can see all these filenames have a certain pattern in them but are not quite the same. So a silver bullet regular expression wouldn't cut it. Instead, I want to look at computational intelligence techniques such as ANN's or another smart idea to solve this problem.

Let's say we want to extract the filenames. Humans would return these values:

Hayate no Gotoku!!

Akatsuki no Yona

Wolf Girl and Black Prince

Mobile Suit Gundam AGE

Or episode numbers: 20, 05, 05, 36. You get where I'm going with this.

What suggested techniques would be useful to achieve the desired result or is this something that is being researched at universities and still has no solution?

1 Answer

0 votes
by (107k points)

I think grammar induction will help you with your query as it works but making a program figure out a regular or some other type of pattern) that matches certain strings but not others. In this process, you have to give it the strings yourself, however, called a training set, with positive examples (strings that should be matched) and negative examples (strings that shouldn't be matched).

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...