Project: Stylometry (Results) Part 2

I didn’t know if my program worked or not, so I decided to test it out. I decided to choose authors that wrote in English to ensure the use of the words that I looked for, were natural and consistent.

First, I would test it against different texts by Charles Dickens and then compare the texts to those by other authors around the same time period:

	Oliver Twist	Great Expectations	A Tale of Two Cities
Oliver Twist	0%	68.1%	60.4%
Great Expectations	68.1%	0%	52.1%
A Tale of Two Cities	60.4%	52.1%	0%

Now, comparing different authors:

	Pride and Prejudice	Dracula
Oliver Twist	218%	115%
Great Expectations	227%	127%
A Tale of Two Cities	240%	112%

So, that worked. I found that the same author had a percentage difference of < ~70% whilst a different author would have a difference of > ~110%.I also saw that a lower word count would lead to less accurate results, so I wanted to try the checker on a novella by Charles Dickens, A Christmas Carol.

	A Christmas Carol (29400 words)
Oliver Twist (171826 words)	63.0%
Great Expectations (190198 words)	97.8%
A Tale of Two Cities (138330 words)	71.3%

Thus, word count does indeed affect the percentage difference because a smaller lengthed novella would contain less of the typical function words used by the same author. However, even though A Christmas Carol is significantly shorter than the rest of the novels, the percentage difference was still smaller than the books by the other authors. And I suspect if I used a few novellas from the other novels, the percentage differnce will be even smaller. I also think that some of the tags I decided to target is a natural part of English and there might be a more personal list of words. Anyway, the longer the texts, the more accurate this program is.