Judging Results

I like it! And it would be cool if users could also use this method to judge (but keep their score separate to the real judges scores).

With the new method proposed, I feel the need to put n/a (“Couldn’t run” or the like) is important. Apart from that… Yeah, I’m fine with the idea of having more of a bins sort of rating (1-5 or 1-10 per game).

I’m not sure there’s much point sorting the entries within each bin if we’re talking community vote - that should sort itself out with more than say, 10 or 15 votes. As for judges… that’s trickier. :slight_smile:

A method used rather successfully in the Ludumdare contest is to present each participant with a random games order judging page. People will generally start from the top, so if everybody rates say, a third of the games, each game will still get a decent amount of votes. There are problems with this method as well, but it’s worked reasonably well for them I’d say. In order to be successful, however, it needs to be coupled with a strong encouragement for each participant to judge a few entries.

EDIT: Also, I feel like an idiot for having brought this whole 0-points issue up. Sorry about the mess! I’ve been coming out as all negative here, when all I really wanted was to point out the issue. I would’ve been entirely fine with the results table staying the same and new scoring mechanisms being used next year. With the new scoring table, two of my four games have an unfair advantage because darkfrog generally gave lower scores… :S This sort of problem is the reason we should remove lowest and highest scores for all games or normalize the judges’ results.

I’m feeling really guilty right now.

It was well after my bedtime and I wasn’t expressing myself as clearly as I would like. My point was that, leaving aside the issue of which exception is thrown, we’re targetting not one API/framework but several dozen, and we can’t test on all of them. Lurking in the back of my mind also was the fact that in efforts to save a few bytes some people are straying into areas which the spec doesn’t cover clearly. For example, AFAIK the spec for Applet.getGraphics() says nothing about it returning null at some stages in the applet’s lifecycle, but some people found that that was the case with a small number of VMs.

[quote]Since the results have now been updated to ignore zero anyway, whats to worry about?
[/quote]
Next year, obviously.

darkfrog’s standard deviation for presentation was 10.3, so with less precision nearly everyone would have been lumped together in three buckets under that scheme. In general most people scarcely use the bottom half of a 1-10 scale. I was thinking about something similar to appel’s buckets suggestion, although that needs work to get something quantifiable which can be averaged.

In general there seems to be a moderate “nostalgia bonus” for basing it on a game the judges played in their youth.

Someone else would have brought it up. Relax.

Fair enough.

Seems like that’s already being thought about, so thats great too.

I won’t be getting involved in the contest again. This sort of stuff afterwards just leaves what was fun activity with a nasty taste. The only good option is community voting and reviewing.

Kev

Come on, that’s late aprilsfool, isn’t it. It’s just natural to discuss the judging (and possible problematic scores) and I don’t think this makes the excelent games this year any worse.

Hmm, not sure about this. Should first be tried parallel to judge votes to see, if there are enough community votes/reviews for all games.

Yeah, I agree. This quote always reminds me why I quit hosting:

Thanks appel! The torch is yours!!

Congrats to all game devs who wrote excellent games and kept the 4K fun. For those who bicker about pointless stats, please take that shit to the Flash 4K contest or something. It’s not that big of a deal.

I think having (from next year on) the judges simply list the games from best to… uhm… least best is actually a very good idea. It removes all subjective scoring from the equation and enforces a uniform point system. Having bins as well has the added benefit of distinguishing games into discrete groups, so there could be two AWESOME, a hundred VERY GOOD, ten OK, and two NOT OK.

I don’t mind these discussion about scoring, but I’m very happy apple said there would be no more fiddling.

And don’t quit, kev… don’t be like that.

[edit:]
It seems like mojang.com is down… I can’t find out why until I get home tonight. Did we get slashdotted or something?

[edit edit:]
Nevermind…

I don’t remember which machine I was running yours on. It was either Java 1.6u7 or Java 1.6u12, but I’m not sure. However, since 1.6u7 is standard on Mac and 1.6u12 is the latest version I think that though it sucks for you, it is fair to expect the game should run without issue. Remember that we are judging a game that was coded and part of the coding is compatibility. If you haven’t tested your game to run properly on 1.6 or greater then you have to lose points. It sucks that it drops you to zero, but like Chris said, what other score can I give if I can’t play it?

What game was yours? Was yours the one I didn’t put any comment at all in? :o

It does run properly on the versions of 1.6 which were released when I finished it (i.e. up to 1.6u11). u12 came out in the last week of February, and u13 sometime in March, it seems, although I didn’t know it existed until today. The breakage in u12 looks like a bug in Webstart, and u13 seems to have a different bug in Webstart based on the stack trace a friend sent me today.

P.S. Surely the latest version is 1.7?

Interesting, was that what you used for NiGHTS and had problems with as well? Because I tested with 1.6 (and it’s working on 1.6.0_07 32-bit WinXP here), so it sounds like there’s something else going on. ???

I did in general give lower scores, but it would because I kept holding out for a game that stood out as exceptional to give high marks to. Last year I marked everything relatively high and then near the end found a game that complete changed the standard and had to go back and move everything else down to differentiate it. Please don’t take that to mean the games were of bad quality, but rather for the most part most of them were pretty high quality, and though I didn’t give 100s to anyone, the reason the score differentiation for me wasn’t very great for most was because most of the games were of a generally high quality. However, the fact that I scored everything relatively lower it shouldn’t have any impact on the order of results, just the end scores.

Latest stable version of Java is 1.6u12 and though it sucks to have this be an issue you have to be able to support the currently stable version of Java. I would be curious to know what it is in your code that is causing a breakage in Java in u12 though, that’s very odd.

I did about half my testing on a Mac with 1.6u7 and the other half on Windows Vista 64-bit with 1.6u12.

Anyone that I gave a zero score for failing to run the application please feel free to PM me if you’d like me to help you resolve the issue. I would also just like to be able to play your games. :slight_smile:

Not any more - u13 :stuck_out_tongue:
The error message and stack trace from Webstart’s console is up a few posts, but it’s long-winded without being informative.

[quote]I would also just like to be able to play your games.
[/quote]
Try on the Mac.

I was able to play it on the Mac just fine. :slight_smile: Very nice work. Despite requiring two players it was a cleverly done game. :slight_smile:

Some stats:
The games can be divided into three categories: applications launched by JNLP, applets launched by JNLP, and applets embedded in a webpage.
The games as a whole break down 49 :: 11 :: 7 with one game not counted because its host website is timing out so I can’t get its JNLP file to check.
The games for which at least one judge failed to assign a score break down 7 :: 5 :: 0.
By judge:
Chris: 3 :: 0 :: 0
darkfrog/sunsett: 3* :: 4 :: 0
Mark: 2* :: 2 :: 0

  • for 1 of these the problem is attributable to problems acquiring the microphone.

I wonder, sunsett, whether you can also play Bridge4k, NiGHTS 4k and Pixeloids4k on the Mac. If so that would be very suggestive of issues with applets in JNLP with u12. I’m also curious as to which version of Java Chris was using.

In general this might feed in to how people approach the applet-only debate on http://www.java-gaming.org/index.php/topic,20142.0.html

I think changing the results was a bad move. Good or bad, fair or not, once published they should be final. Its kinda unfair to tell people they took 2th or 3rd place and take it back.
I also don’t like the idea of removing the highest and lowest judge score, it doesn’t make sense to me(even though one of the judges gave my game probably the lowest postive score of the entire contest while all others were much more open-handed :P). Except for the 0% issues i think the current judging system is fine. The community voting has one big problem: the best games get all the comments and reviews, and most of the other games gets nothing.

java version “1.5.0_16”
Java™ 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284)
Java HotSpot™ Client VM (build 1.5.0_16-133, mixed mode, sharing)

OS X of course.

Hi Darkfrog. My game was WorldRallyDriver4K: http://www.java4k.com/index.php?action=games&method=reviews&cid=5#197

Your comment was “Basic racing game. Nice idea, but limited in scope.” which I didn’t understand what you meant by this. I have a feeling you were referring to the fact that there was only ghost cars and no opponents, but I wasn’t sure? If you have a chance, can you please explain your comment? Were you hoping to see something else in the game?

Also, I was a little disappointed that your score was very different to all the other judges scores. Anywhoo, as everyone says, it’s just a fun comp. …although, I’m totally stoked that Desert Bus didn’t beat me! ;D

EDIT: I just worked out, if my game failed to run for you and you gave me 0. I would jump from 15th place into 5th place. So I’m going to pretend that happened. Yay me! 5th place! Woohoo! 8)

Cheers,
Ranger.