How to remove hundreds of spam comments from a Drupal site

Suffered a spam attack on your Drupal site?  You are in good company. Even with Mollom installed, your site can be overrun faster than a dog with fleas.

To get rid of the thousands and thousands of spam comments, you have two choices.

Either, delete the comments from your MySQL database

To delete spam directly at your MySQL database, you will have to log in to your hosting service.  Then use phpAdmin to find the right table.  And truncate the table to clear out all the comments.

I haven’t tried this but proceed logically and it should work.

Or, delete comments from your front end using a View

  1. Install two modules: Entity API and Views Bulk Operations
  2. Clear your cache
  3. Make a new View at Structure/Views/ and use Comments as your content
  4. Leave the format as Unformatted and set the number of comments as 500 — check the pager
  5. Continue & Edit
  6. Change Content to Fields
  7. Add a Field for Bulk Operations: Comment and set the value as Delete
  8. Remove the filters unless you want them
  9. Save
  10. Go to the View (e.g., http:/yourwebsite.name/spam-control or whatever you called your view)
  11. Start deleting

You need two clicks at the top and you must confirm the list.  It takes a little time and it is probably quicker to delete the table but this was more satisfying and can remain in the background to clean up smaller spam attacks in future.

10 steps to build a spam catcher

Here are the ten broad steps to build a spam catcher

  1. Get a sample of emails that are known to be spam or not spam. Split the sample 60:20:20 to provide a “training” set, a “cross-validation” set and a “test” set.
  2. Turn each email into a list of words by
    • Stripping out headers (if not part of the spam test) and other redundancies
    • Running NLP software to record the stem of a word only (for example, record city and cities as cit)
  3. Count the number of times each unique word appears in the sample and order the list so that we can use the top 100 or 10 000 or 50 000 (whatever) to check for spam.  Remember to use stemmed words!
  4. Convert the list of words in each email into a list of look-up numbers by substituting the row number of the word from the dictionary we made in Step 3.
  5. For each email, make another list where row 1 is 1 if the first word in the dictionary is present in the email, where row 2 is the 1 if the second word in the dictionary is present in the email. If the word is not present, leave the value for that row as zero. You should now have as many lists are you have emails each with as many rows as you have words in your spam dictionary.
  6. Run a SVM algorithm to predict whether each email is spam (1) or not spam (0).  The input is the list of 1s and 0s indicating which words are present in the email.
  7. Compare the predictions with the know values and compute the percentage correct.
  8. Compute the predictions on the cross-validation set and tweak the algorithm depending on whether the cross-validation accuracy is too similar to the training accuracy (suggesting the model could be stronger) or too dissimilar (suggesting the model is too strong).
  9. Find the words most associated with spam.
  10. Repeat as required.

CHECK OUT SIMILAR POSTS

Step 6: Consolidating my online strategy – Spam catcher

I have a self-hosted blog. Now to share!

Meal Worm in Venus Fly Trap via blmurchNow I’ve got a working copy of my blog moved from WordPress.com to a WordPress.org installation on Dreamhost, I am in an uncomfortable inter-regnum.  If have two parallel copies of the same content on two different domains (http://flowingmotion.wordpress.com and http://flowingmotion.jojordan.org).  That doesn’t matter very much because though my blog is healthy, it is not very big.

Steps for connecting with the world

But in quickly I want to accomplish several steps

  • Get an anti-spam filter set up on my blog
  • Redirect my links from WordPress.com so that anyone visiting my blog is redirected to the new copy (and in time I recover my “google juice”
  • Set up my Google Analytics and RSS feeds
  • Prettify my blog with all the additional plugins that I need to function well.

Setting up Akisimet

Akisimet, WordPress’ anti-spam system is free for personal and non-commercial blogs.  There are x steps to getting it set up.

1  Activate the plugin that was installed with your One Click Install on Dreamhost

Go to dashboard; look down the right column; choose, Plugins-Installed; find Akisimet and activate.

2  Get your Key

Follow the link to get your Key.  Not helped that I am now using another email address,  I was muddled for a moment and got a new code.  What I needed to do was to put in the email address I used for my old WordPress blog and retrieve my old key.

Copy & paste.

All done!

Akisimet begins working immediately and you can rejoice at the idiocy of people who waste their time and ours sending out robots to promise personal attention and service!  Hail Akisimet.