Concordia University Geeks Design New Spam Filter

Discussion in 'In The News' started by roundabout, Nov 28, 2012.

  1. roundabout

    roundabout VIP

    Joined:
    Feb 17, 2011
    Messages:
    2,713
    Likes Received:
    154
    Trophy Points:
    63
    Filtering spam: Researchers propose new method to rid inboxes of unwanted email

    Spammers have recently turned high-tech, using layers of images to fool automatic filters. Thanks to some sophisticated new cyber-sleuthing, researchers at Concordia University's Institute of Information Systems Engineering are working toward a cure.


    Once upon a time, Spam came in a can and could be easily avoided. Nowadays, spam plagues email inboxes around the world, hawking miracle pills and enticing the gullible with tales of offshore bank accounts containing untold fortunes. These once text-based email infiltrators have recently turned high-tech, using layers of images to fool automatic filters.

    Thanks to some sophisticated new cyber-sleuthing, researchers at Concordia University's Institute of Information Systems Engineering are working toward a cure. PhD candidate Ola Amayri and thesis supervisor, Nizar Bouguila, have conducted a comprehensive study of several spam filters in the process of developing a new and efficient one. They have now proposed a new statistical framework for spam filtering that quickly and efficiently blocks unwanted messages.

    "The majority of previous research has focused on the textual content of spam emails, ignoring visual content found in multimedia content, such as images. By considering patterns from text and images simultaneously, we've been able to propose a new method for filtering out spam," says Amayri, who recently published her findings online in a series of international conferences and peer-reviewed journals. Amayri explains that new spam messages often employ sophisticated tricks, such as deliberately obscuring text, obfuscating words with symbols, and using batches of the same images with different backgrounds and colours that might contain random text from the web.

    However, until now, the majority of research in the domain of email spam filtering has focused on the automatic extraction and analysis of the textual content of spam emails and has ignored the rich nature of image-based content. When these tricks are used in combination, traditional spam filters are powerless to stop the messages, because they normally focus on either text or images but rarely both. So how do we stop spam before it sullies our inboxes?

    "Our new method for spam filtering is able to adapt to the dynamic nature of spam emails and accurately handle spammers' tricks by carefully identifying informative patterns, which are automatically extracted from both text and images content of spam emails," says Amayri. By conducting extensive experiments on traditional spam filtering methods that were general and limited to patterns found in texts or images, she has developed a much stronger way, based on techniques used in pattern recognition and data mining, to filter out unwanted emails.

    Although the new method has been tested on English spam emails, Amayri says it can be easily extended to other languages. While this new spam-detecting approach is still in the development stage, Amayri and Bouguila are currently working on a plug-in for SpamAssassin, the world's most widely used open-source spam filter. Amayri hopes that this plug-in will allow other researchers to perform further tests and make more progress in the field of spam detection. "Spammers keep adapting their methods so that they can trick the spam filters, says Amayri. "Researchers in this field need to band together to keep adapting our methods too, so that we can keep spam out and focus on those messages that are really important."

    Source:
    http://phys.org/news/2012-11-filtering-spam-method-inboxes-unwanted.html
     
  2. DKPMO

    DKPMO VIP

    Joined:
    Mar 31, 2011
    Messages:
    1,452
    Likes Received:
    68
    Trophy Points:
    48
    Location:
    Elaborate Underground Base
    She is not saying much about the algorithm's/plugin's actual capabilities...
     
  3. roundabout

    roundabout VIP

    Joined:
    Feb 17, 2011
    Messages:
    2,713
    Likes Received:
    154
    Trophy Points:
    63
    One of the things I would venture it DOESNT factor in, are complaints. Which is the real tragedy here for email marketers.

    RAZOR, etc. all fingerprint a template after so many mails go out - but shouldn't complaints really be the factor combined with this? Why should any mail be penalized for volume alone? Isn't that guilty before being proven innocent?
     
  4. DKPMO

    DKPMO VIP

    Joined:
    Mar 31, 2011
    Messages:
    1,452
    Likes Received:
    68
    Trophy Points:
    48
    Location:
    Elaborate Underground Base
    I thought all she was talking about is some sort of advanced image analysis / fingerprint extraction.

    Obviously this is just one set of signals (vs. complaints) that has pretty limited value on a standalone basis.

    But I was more interested in how well it actually works... I bet it would be computationally prohibiitive to check every image at email delivery time, especially if you have to do some sort of advanced pattern mining.

    This is likely just a research / exploratory project, not something coming to the inbox near you...
     
  5. nickphx

    nickphx VIP

    Joined:
    Apr 2, 2011
    Messages:
    1,139
    Likes Received:
    363
    Trophy Points:
    83
    Gender:
    Male
    Location:
    guadalajara, chiuhuahua
    Razor is based on feedback... Maybe you should read the razor docs, pay attention to the part that mentions 'nominating' emails as spam/ham...
     
  6. roundabout

    roundabout VIP

    Joined:
    Feb 17, 2011
    Messages:
    2,713
    Likes Received:
    154
    Trophy Points:
    63
    That's what they SAY but I had an incident maybe two years ago where as a test, I sent 40-50K to several of my seed accounts.. I was trying to pinpoint when Razor would hit (e.g. how many emails) and it DID hit in the mid 40k's of delivery... I assure you I wasn't hitting the SPAM button on myself.
     
  7. Fun4uoc

    Fun4uoc VIP

    Joined:
    Apr 22, 2011
    Messages:
    605
    Likes Received:
    23
    Trophy Points:
    28
    Or were you? muahahahahaha
     
  8. nickphx

    nickphx VIP

    Joined:
    Apr 2, 2011
    Messages:
    1,139
    Likes Received:
    363
    Trophy Points:
    83
    Gender:
    Male
    Location:
    guadalajara, chiuhuahua
    Did you host the receiving mail server yourself or was it third party? You can automate nomination of messages with a simple script..

    I did have this pretty stupid idea of using the ~300k ips we have online to create a metric ton of fake razor clients and nominate lots of spam as spam and my spam as ham.. Haven't put it together yet, can't think of a way to not be painfully obvious.
     

Share This Page