Sueness and Popularity: Data Collection Protocol

(Continued from yesterday’s Sueness and Popularity post)

This post is a list of all the statistics I’m going to collect on each webcomic. It might not be very entertaining; the point is to stop me fiddling with things half-way through the process.

Comic Details

  • Title
  • Website (somewhere from which the rest of the comic is reachable — preferably the home page)

Author details

For each artist, writer, etc. if several. This includes colourists.

  • Full Name, if public (The test I’m using cares about author name, but I don’t want to go stalking people just to get accurate results.)
  • Any other public names, psuedonyms, etc
  • Date they started contributing to the story (if they weren’t there from the start)

The Webcomic List Statistics

I’ll collect each of these numbers twice: once when the comic is first added, and again a month later when I score it.

If I miss a day, I’ll collect the data for that day as soon as I have time (hopefully not more than a couple of days late).

  • Nominal data-collection date (when I should have collected these stats — date added, or that date plus one month).
  • Actual data-collection date (should usually be the same)
  • Number of comic pages. This includes side stories, title pages, etc., but not non-story content (cast lists, author commentary), or guest comics that don’t contribute to canon. Text-only pages (like Erfworld often has) do count.
  • TWL stats (“views this month”, “average views a month”, “favourite of n members”)
  • Number of TWL comments
  • Number of “accusations” in those comments (posts saying the comic contains a Mary Sue)

Character Details

I aim to score every distinct character who appears in more than one comic page (if there’s only one page, I’ll score every character in it). This includes animals, AI’s, robots, etc., but not random inanimate objects (e.g. the One Ring isn’t a character).

Interchangeable randoms (e.g. mooks, or most of the stick figures in XKCD) don’t count, but named characters who never appear on panel do (if they’re mentioned on at least two pages).

As well as the score, I’ll collect this inforation:

  • Name
  • Other names (psuedonyms, superhero names, maiden names, etc.)
  • Gender (“male”, “female”, or “other”, based on how the character’s presented in the comic (so robots, aliens, etc. can be male or female if they’re presented as clearly one gender). If the correct gender is complex or unclear, I’ll use “other”.
  • Number of pages with this character (same rules as above).

Mary Sue Test score

(For each non-one-off character, as above.)

I’ll score characters based off supplementary information (commentary, previous comics with that character, etc.), even if I wouldn’t include it in the page count. However, I won’t consider (and will try to avoid reading) anything that was published after the one-month cutoff date.

If a comic is fanfic, I’ll try to research the canon enough to score it correctly — but I won’t necessarily read the whole source canon.

This might mean reading spoilers. If so, I reserve the right to either (a) put off scoring the fanfic until I’ve read the original, or (b) get someone else to score it according to this protocol. However, I don’t plan to do this unless the original is something I really want to read.

A lot of questions ask about the author’s intent or feelings. If I have that information (e.g. from author commentary), I’ll use it. If not, I’ll default to “no” when the question is just about the author’s mind (e.g. “do you see your character as a role model”), or “yes” if there’s a concrete part I can answer “yes” to.

For instance, “does your character have … an exotic name … chosen primarily because” will get a “yes” if the character has an exotic name but I don’t know why it was chosen.

If I run into anything else ambiguous, I’ll check SyeraMiktayee’s Ask.fm (but I won’t actually ask about it, as I don’t want to spam xir with questions). If that doesn’t clear it up, I’ll just pick something and document what I did.

Can a Mary Sue Test Predict Popularity? (Experiment Plan)

A few years ago, I calculated some statistics on SyeraMiktayee’s Universal Mary Sue Litmus Test.

This time, I’m back with a specific question: can it predict popularity?

If Mary Sues are bad writing, and the most popular stories tend to be well written, then stories with Mary Sues should be less popular.

To test this, I’m going to score 50 webcomics, and see if “Sueness” and popularity line up.

Data Collection

There are plenty of sites that track popular webcomics. However, they tend to focus on rankings (e.g. “5th most popular comic). Unfortunately, scoring the 50 best comics won’t tell me much about bad writing.

Instead, I’m going to score 50 new-ish comics.

I’ve found a site, the webcomic list, which tracks views per month (a good approximation of popularity), and also has a list of recently added comics. With any luck, this will include some terrible ones.

To prove I’m not picking and choosing what I score, I’m going to score the first 50 comics added to that page after Tuesday (12 July). I’ll give each one a month (so there’s a decent amount of content) then start scoring on 12 August.

Data Analysis

I’ve got lots of theories I want to try, and consequently lots to record.

For the popularity question, though, only two things matter: the “sueness” (Mary Sue Test score) of the highest-scoring worst character (since one Sue can ruin a whole story), and the average views per month (as of the 1-month anniversary when I score the comic).

I’ll compare these with a simple linear regression. If I’m right, this will show a negative correlation (higher “Sueness” equals lower popularity).

If I get a “p-value” of less than 0.05 (i.e. a 1 in 20 or less chance that the result was just random), I’ll consider it proof (or at least, strong evidence) that the test can predict popularity.

What next?

Before I start scoring anything, I need to define the rules I’ll use to do it.

Expect a post on Monday or Tuesday with a list of exactly what I’m going to record, and how I’m going to decide tricky questions.

Goal: Write Goals

Between illness, broken Internet, and various other demands on my time, my focus petered out at the end of the last ROW80 round.

I’d like to say that won’t happen again. But that means something will have to change.

My Internet should be fixed soon, but that may not be enough.

Can I reduce the impact of being sick?

The last few times I’ve been sick like this, it’s taken ages to go away. I suspect this is because I don’t get enough rest. However, setting aside my goals to give myself time to relax doesn’t really work; even if I don’t write, I still stay up late and lose sleep.

For all I know, I’d actually get to sleep sooner (and heal better) if I did keep my writing goals.

What about those other demands on my time?

Some of these are house-related things (e.g. organising repairs). Some are related to things at work. But the majority are social commitments.

When I started ROW80, I basically had one day out a month. I’ve now got fencing every week, and various other social things on three weekends out of four — plus, often, a one-off thing on the fourth.

This probably isn’t that much in an absolute sense, but every extra commitment is one fewer day I can use to work on my own projects (like writing). Combined with being sick (and therefore being behind on everything), this adds up to more pressure than I’m used to.

So, what can I do about this?

Improving my time management would help, but I don’t think that’s going to happen in the short term.

Which means I need to reduce my commitments — either cut down on social situations (and become lonely), cut down on writing (and feel my goal of being a serious writer slip further from my grasp), or cut down on my other hobbies, like programming (I’d like to keep up my skills for what I do at work) and reading.

I’ve considered cutting down on punctuation (especially parentheses), but it’s too great a loss for the time it would save.

If this were an easy choice, I would have made it already.