A few years ago, I calculated some statistics on SyeraMiktayee’s Universal Mary Sue Litmus Test.
This time, I’m back with a specific question: can it predict popularity?
If Mary Sues are bad writing, and the most popular stories tend to be well written, then stories with Mary Sues should be less popular.
To test this, I’m going to score 50 webcomics, and see if “Sueness” and popularity line up.
Data Collection
There are plenty of sites that track popular webcomics. However, they tend to focus on rankings (e.g. “5th most popular comic). Unfortunately, scoring the 50 best comics won’t tell me much about bad writing.
Instead, I’m going to score 50 new-ish comics.
I’ve found a site, the webcomic list, which tracks views per month (a good approximation of popularity), and also has a list of recently added comics. With any luck, this will include some terrible ones.
To prove I’m not picking and choosing what I score, I’m going to score the first 50 comics added to that page after Tuesday (12 July). I’ll give each one a month (so there’s a decent amount of content) then start scoring on 12 August.
Data Analysis
I’ve got lots of theories I want to try, and consequently lots to record.
For the popularity question, though, only two things matter: the “sueness” (Mary Sue Test score) of the highest-scoring worst character (since one Sue can ruin a whole story), and the average views per month (as of the 1-month anniversary when I score the comic).
I’ll compare these with a simple linear regression. If I’m right, this will show a negative correlation (higher “Sueness” equals lower popularity).
If I get a “p-value” of less than 0.05 (i.e. a 1 in 20 or less chance that the result was just random), I’ll consider it proof (or at least, strong evidence) that the test can predict popularity.
What next?
Before I start scoring anything, I need to define the rules I’ll use to do it.
Expect a post on Monday or Tuesday with a list of exactly what I’m going to record, and how I’m going to decide tricky questions.