Can an author's unique “literary style” be used to identify him/her as the author of a text? [closed]

Question

1 Answer

JaneShaw · Answer 1 · 2019-07-31T12:56:38+0000

Absolutely it is possible, and indeed the record of success in identifying an author given a text or some portion of it is impressive.

A couple of representative studies (warning: links are to pdf files):

• Quantitative Analysis of Literary Styles

• Stylogenetics: Clustering-based stylistic analysis of literary coroora

To aid your web-search, this discipline is often called Stylometry (and occasionally, Stylogenetics).

So the two most important questions are i suppose: which classifiers are useful for this purpose and what data is fed to the classifier?

What I still find surprising is how little data is required to achieve very accurate classification. Often the data is just a word frequency list. (A directory of word frequency lists is available online here.)

For instance, one data set widely used in Machine Learning and available from a number of places on the Web is comprised of data from four authors: Shakespeare, Jane Austen, Jack London, Milton. these works were divided into 872 pieces (corresponding roughly to chapters), in other words, about 220 different substantial pieces of text for each of the four authors; each of these pieces becomes a single data point in the data set. Next, a word-frequency scan was performed on each text, and the 70 most common words were used for the study, the remainder of the results of the frequency scan were discarded. Here are the first 20 of that 70-word list.

['a', 'all', 'also', 'an', 'and', 'any', 'are', 'as', 'at', 'be', 'been','but', 'by', 'can', 'do', 'down', 'even', 'every', 'for', 'from']

Each data point then is just a count of each word of the 70 words in each of the 872 chapters.

[78, 34, 21, 45, 76, 9, 23, 12, 43, 54, 110, 21, 45, 59, 87, 59, 34, 104, 93, 40]

Each of these data points is one instance of the author's literary fingerprint.

Since this would require a lot of permutations, combinations, data extraction and data mining, a piece of broad knowledge on this will be achieved through studying Machine Learning Algorithms, which is eventually a part of Machine Learning Certification.

Can an author's unique “literary style” be used to identify him/her as the author of a text? [closed]

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources