Project: Stylometry (Results) Part 2
I didn’t know if my program worked or not, so I decided to test it out. I decided to choose authors that wrote in English to ensure the use of the words that I looked for, were natural and consistent.
First, I would test it against different texts by Charles Dickens and then compare the texts to those by other authors around the same time period:
|Oliver Twist||Great Expectations||A Tale of Two Cities|
|A Tale of Two Cities||60.4%||52.1%||0%|
Now, comparing different authors:
|Pride and Prejudice||Dracula|
|A Tale of Two Cities||240%||112%|
So, that worked. I found that the same author had a percentage difference of < ~70% whilst a different author would have a difference of > ~110%.I also saw that a lower word count would lead to less accurate results, so I wanted to try the checker on a novella by Charles Dickens, A Christmas Carol.
|A Christmas Carol (29400 words)|
|Oliver Twist (171826 words)||63.0%|
|Great Expectations (190198 words)||97.8%|
|A Tale of Two Cities (138330 words)||71.3%|
Thus, word count does indeed affect the percentage difference because a smaller lengthed novella would contain less of the typical function words used by the same author. However, even though A Christmas Carol is significantly shorter than the rest of the novels, the percentage difference was still smaller than the books by the other authors. And I suspect if I used a few novellas from the other novels, the percentage differnce will be even smaller. I also think that some of the tags I decided to target is a natural part of English and there might be a more personal list of words. Anyway, the longer the texts, the more accurate this program is.