Minecraft Data for Science

Discussion in 'Bukkit Discussion' started by aftersox, Apr 1, 2013.

Thread Status:
Not open for further replies.
  1. Offline


    I'm a PhD student studying organizations and networks. Lately I've been interested in groups and leadership in virtual worlds.

    I'm looking for data from CoreProtect, LogBlock, or Prism (or some other player logging plugin). Also if you use Towny, Factions, or some other group-based plugin, that would be useful to know as well.

    Using this data I can construct a social network from players that share resources, work on the same projects, kill common enemies, etc. The detail and temporal aspects of this data are very seldom seen in my field, and it has the potential to answer some difficult questions. For instance: is leadership contagious? How does resource distribution affect group success? How does group integration affect turnover?

    I want to find some server owners who are willing to grant me their log data. To protect the anonymity of your players, the data should be cleansed of usernames and chat content. The IRB (ethic board) at my university will only allow me to collect this data if it cleansed this way.

    If you use common plugins then I can help write a SQL or other script to cleanse it for you. All you would have to do is run it and send me the results. For instance if you use Prism (I like Prism) this SQL snippet would copy data to a new table (ignoring non-player data), then it would recode all player names to a consistent ID. After it runs you would export the cleansed table, and send it to an FTP server I have. There wouldn't be much for you to do, except this process may take a very long time to run (read. hours) if you have a lot of data.

    Full disclosure - I'm a PhD student at the University of Kentucky. Your data would be used solely for research purposes (so I can escape poverty and get my doctorate). Your identity would remain completely protected and I would not share it with anyone. Any results I publish would be in the aggregate (means, regression results, etc) and no individual player or server would be singled out.

    If you are interested, please let me know. You can reply to this thread, PM me, or email me (see the link above). You would be providing me with a tremendous boost. Thanks!
  2. Offline


    So how would we know that this is really you? I wouldn't just go out, and give you my 2GB of CoreProtect logs, to a person that I don't even know.
  3. Offline


    I could post a Reddit-like verification if that would help. Send an email to jesse.fagan (at) uky.edu and we can discuss it. If you have a particularly large server and the cleansing process takes a long time I might even be able to dig up some research funds to compensate you.

    Keep in mind, the data should come to me totally anonymous. No IPs, no usernames, no chat content, nothing that can be used to identify the individual. The potential harm of sharing such data should be practically non-existant. Let me work on a script for CoreProtect so you can see what I mean.
  4. Offline


    I'll send you an e-mail when I can get to it. Project sounds promising, though.
  5. Offline


    If I can find the LogBlock data from my old server, I'd be happy to help.
  6. Offline


    This sounds like an awesome, interesting study. Support university studies people. Great things come out of it.
  7. Offline


    How would I make my LogBlock and Towny Data anonymous? I'd totally send my datas.
  8. Interesting, one of my project is to make a log plugin that would log everything to a flat file. It was intended to be some sort of "backup" to be able to know exactly what happens when I/admins are not on the server.
    It could be really data mining friendly. I already had some experience in data mining (I developed CMRules, Prefixspan and Hirate-Yamana alorithms in Java) so I know you can get awesome results with just some big text files :)
  9. Offline


    I can share a 55,665,874 entry LogBlock table. It was for a recently removed world on my Creative server. It's not a complete record(all entries before the first of December were purged and banned griefers had their logs removed), but it has most things. The data is in this format:

    Tell me if you're interested in it.
  10. Offline


    I appreciate you taking the time to write a player name cleansing script for Prism. However, I should point out that if anyone's sending Prism data, be sure to exclude any player-chat, player-command, player-join and player-quit events. Especially commands and joins since you may IP logging, and commands may contain passwords etc for login plugins.

    Hope you get what you need. If you have something more official about a study or the project, maybe some university-hosted information on the project and a university email I can reach you at, I'd consider sharing out data as well. We have about 40 million records as it stands.
  11. Offline


    I've sent you my hawkeye database. I hope that the mix of formats doesn't affect your results.
Thread Status:
Not open for further replies.

Share This Page