Coding a profanity filter

Discussion in 'Game Development (Technical)' started by Raptisoft, Aug 24, 2011.

  1. Raptisoft

    Indie Author

    Joined:
    Jul 29, 2004
    Messages:
    804
    Likes Received:
    0
    So, a single beta tester uploaded the very first level for a "level sharing" thing I'm putting into the iPhone version of Robot Wants Kitty, and I realize I need a profanity filter.

    Has anyone coded one? In C++? Anyone know a good one? I just want to detect profanity or likely profanity in the level names and reject. And then the kids will make D1cKSuCk3RPhuCKL3V3L and I won't have accomplished anything, but I will be able to filter out the most stupid and most uncreative profaners.
     
  2. jpoag

    jpoag New Member

    Joined:
    Mar 15, 2008
    Messages:
    806
    Likes Received:
    0
    Spam filters aren't perfect. There are lots of edge cases where non-profane words trigger false positives. You could load a dictionary into your program and check false positives, however, that takes a good chunk of time (to load) and a lot of memory (something you probably wouldn't want to do on an iOS device).

    Any algorithm that you or anyone comes up with will be circumvented by persistent trolls. The best you can hope to do is prefilter and/or prioritize the user submissions for human review (or whitelist specific users).

    PHP:
    std::vector<std::stringblacklist;

    // read blacklist file 
    blacklist.push_back("badword");
    blacklist.push_back("b4dw0rd");

    int aFlagCount 0;

    for(
    unsigned int i 0blacklist.size(); i++)
    {
        if(
    input.find(blacklist[i]) != std::string::npos)
        
    aFlagCount++;// flag for human review
    }

    if(
    aFlagCount MAX_THRESHOLD)
        
    Reject(); // almost definitely spam

    else
        
    Submit(LevelaFlagCount);
    You should be able to find bad word dictionaries, some with l33t5p34k entries.


    Also, consider using a server-side filter to help presort for human review.
     
    #2 jpoag, Aug 24, 2011
    Last edited: Aug 24, 2011
  3. MadSage

    Original Member

    Joined:
    Sep 23, 2004
    Messages:
    72
    Likes Received:
    0
    I don't have my code to hand - I recall it was quite complex. It was part of a server, so speed was essential and I was using a hash table to access the dictionary.

    The dictionary wasn't particularly large. The entire dictionary was in lowercase. When reading words from the input string, I converted those to lowercase using my own tolower function which was something like this...


    Code:
    char tolower(char c)
    {
       char 1337Table = "olzeasgtbg";
    
       if (c >= 'A' && c <= 'Z')
          c |= 0x20;
       else if (c >= '0' && c <= '9')
          c = 1337Table[c - '0'];
    
       return c;
    }
    Once that conversion was done, I could lookup words in the dictionary, and it avoided having a huge dictionary with all the different l33t5p34k combinations in.

    Another problem you get is people putting spaces or other characters between letters too, so you have to remove those before doing a lookup. I never did make it completely fool proof, and I doubt anyone could, but it seemed to work pretty well.
     
  4. jpoag

    jpoag New Member

    Joined:
    Mar 15, 2008
    Messages:
    806
    Likes Received:
    0
    clbuttic

    That's a pretty good Idea. I remember years ago reading a paper that used a similar technique to map the genome of chain letters.

    So basically:
    1. Remove known leetspeak
    2. convert all text to lower
    3. remove all spaces
    4. search for blacklisted strings
    5. Send submissions to queue for human review, sort-able on the score

    The threshold is what will help you reduce the number of submissions to review. Whitelisting & blacklisting users/device IDs will help reduce the workload over time.
     
  5. Jamie W

    Original Member Indie Author

    Joined:
    Apr 16, 2006
    Messages:
    1,211
    Likes Received:
    0
    How about allowing users to report bad content. If you get 3 reports from different users, the content (level) is automatically removed. Kinda like a self-moderating community dynamic. Could also have bad users notified or banned if the continue to submit bd content.
     
  6. Applewood

    Moderator Original Member Indie Author

    Joined:
    Jul 29, 2004
    Messages:
    3,859
    Likes Received:
    2
    Be careful how you do this. I got banned from westwood online for a week back in the day for typing in the word "snigger".

    That's not even roughly a racial slur but a proper word all of it's own, and it's not funny when you lose access to your addiction for seven days because some knob head doesn't know how to write a profanity filter! :s
     
  7. electronicStar

    Original Member

    Joined:
    Feb 28, 2005
    Messages:
    2,068
    Likes Received:
    0
    be careful not to ban peopl for words like wristwatch for example
     
  8. Richard Nunes

    Richard Nunes New Member

    Joined:
    Nov 10, 2008
    Messages:
    202
    Likes Received:
    2
    Players who use profanity are probably trolling in other ways. I find it unlikely to have an angelic player in-game who is a potty-mouth on forums or in chat. If you accept complaints about player behaviour and profanity flags from other players on the forum, you can correlate the two and begin restricting the player in-game and on the forums until they either clean up their act or quit the game.

    Ideally, the sort of players who are using l33t profanity will find no one is listening to them or giving them attention and they'll look for their own kind elsewhere.

    Perhaps each time a player is identified as using profanity you force them into a penance campaign against 2000 sewer rats with 1HP each and immune to area attacks... force the player to fight each and every rat one-on-one as their gnawed at by other rats.
     

Share This Page

  • About Indie Gamer

    When the original Dexterity Forums closed in 2004, Indie Gamer was born and a diverse community has grown out of a passion for creating great games. Here you will find over 10 years of in-depth discussion on game design, the business of game development, and marketing/sales. Indie Gamer also provides a friendly place to meet up with other Developers, Artists, Composers and Writers.
  • Buy us a beer!

    Indie Gamer is delicately held together by a single poor bastard who thankfully gets help from various community volunteers. If you frequent this site or have found value in something you've learned here, help keep the site running by donating a few dollars (for beer of course)!

    Sure, I'll Buy You a Beer