Saturday, March 28, 2009

De-gnoming your blog

My blog has been filling up with spam over the last few months - the nasty, long messages that pollute the comment space and are a pain to delete because the bastards were programmatically sticking them in tens and tens of posts.

Luckily, Google has APIs to Blogger, and better yet, they even have C# libraries than make coding against these API easy. I didn't even have to dust off Eclipse!

So an hour later I had the program that allowed me to clean up the spam quickly.

The code sucks in the entire blog (including the abstracts of posts and comments) and allows one to quickly search and delete spam. Type 'help' on the command prompt, and it will explain the usage.

A few convenient shortcuts:

List or delete all comments from someone:
list|delete comments by name
e.g.
list comments by peter.w

List or delete all comments that have the same author as well as the text:

list|delete comments like sample_comment_uri
e.g.
list comments like http://www.blogger.com/feeds/3554166144204741789/3010398536200323405/comments/default/4945114575786176865

List or delete all comments with a phrase in title:
list|delete comments with text_string
e.g.
list comments with wow gold

List comments in a blog or a message:
list comments in blog_name|first_words_of_post_title
e.g.
list comments in Back to Microsoft
list comments in 1-800-MAGIC


If you're one of the truly insane types that would run code compiled by a random guy from the Internet, the built version is here: http://www.solyanik.com/drop/BlogDeSpammer.zip. Otherwise, the code is below, free as in beer!

Incidentally, it might make a good primer on how to retrieve and manipulate Blogger posts and comments programmatically.


//-----------------------------------------------------------------------
// <copyright>
// Copyright (C) Sergey Solyanik.
//
// This software is in public domain and is "free as in beer". It can be
// redistributed in full or in parts for free and without any preconditions.
// </copyright>
//-----------------------------------------------------------------------
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using Google.GData.Client;

namespace BlogDeSpammer
{
/// <summary>
/// Comment class. Just a trivial holder of essential comment data.
/// </summary>
class Comment
{
public string Text;
public string Author;
public string EditUri;

public Comment(string text, string author, string editUri)
{
Text = text;
Author = author;
EditUri = editUri;
}
}

/// <summary>
/// Post class. Just a trivial holder of essential post data.
/// </summary>
class Post
{
public string Title;
public string CommentsFeed;
public ICollection<Comment> Comments;

public Post(string title, string commentsFeed)
{
Title = title;
CommentsFeed = commentsFeed;
Comments = null;
}
}

/// <summary>
/// Blog class. Just a trivial holder of essential blog data.
/// </summary>
class Blog
{
public string Name;
public string Feed;
public ICollection<Post> Posts;
public Blog(string name, string feed)
{
Name = name;
Feed = feed;
Posts = null;
}
}

/// <summary>
/// Main functionality.
/// </summary>
class Program
{
/// <summary>
/// Gets collection of blogs belonging to the user.
/// </summary>
/// <param name="blogger"> Blogger service. Must be initialized with the parameters.
/// </param>
/// <returns> Collection of blogs. </returns>
private static ICollection<Blog> GetBlogs(Service blogger)
{
List<Blog> blogs = new List<Blog>();

FeedQuery query = new FeedQuery();
query.Uri = new Uri("http://www.blogger.com/feeds/default/blogs");
AtomFeed results = blogger.Query(query);
while (results != null && results.Entries.Count > 0)
{
foreach (AtomEntry entry in results.Entries)
{
string blogFeedLink = null;
foreach (AtomLink link in entry.Links)
{
if (BaseNameTable.ServiceFeed.Equals(link.Rel))
{
blogFeedLink = link.HRef.ToString();
break;
}
}
blogs.Add(new Blog(entry.Title.Text, blogFeedLink));
}

if (results.NextChunk == null)
break;

query.Uri = new Uri(results.NextChunk);
results = blogger.Query(query);
}

return blogs;
}

/// <summary>
/// Retrieves all the posts for the blog.
/// </summary>
/// <param name="blogger"></param>
/// <param name="blog"></param>
/// <returns> Collection of posts. </returns>
private static ICollection<Post> GetPosts(Service blogger, Blog blog)
{
List<Post> posts = new List<Post>();

FeedQuery query = new FeedQuery();
query.Uri = new Uri(blog.Feed);
AtomFeed results = blogger.Query(query);
while (results != null && results.Entries.Count > 0)
{
foreach (AtomEntry entry in results.Entries)
{
string commentsFeedLink = null;
foreach (AtomLink link in entry.Links)
{
if ("replies".Equals(link.Rel) &&
"application/atom+xml".Equals(link.Type))
{
commentsFeedLink = link.HRef.ToString();
break;
}
}
Console.Write(".");
if (commentsFeedLink != null)
posts.Add(new Post(entry.Title.Text, commentsFeedLink));
}

if (results.NextChunk == null)
break;

query.Uri = new Uri(results.NextChunk);
results = blogger.Query(query);
}

return posts;
}

/// <summary>
/// Gets comments for the given post.
/// </summary>
/// <param name="blogger"> Blogger service, initialized with credentials. </param>
/// <param name="post"> The post. </param>
/// <returns> Collection of credentials. </returns>
private static ICollection<Comment> GetComments(Service blogger, Post post)
{
List<Comment> comments = new List<Comment>();

FeedQuery query = new FeedQuery();
query.Uri = new Uri(post.CommentsFeed);
AtomFeed results = blogger.Query(query);
while (results != null && results.Entries.Count > 0)
{
foreach (AtomEntry entry in results.Entries)
{
Console.Write(".");
comments.Add(new Comment(entry.Title.Text, entry.Authors[0].Name,
entry.EditUri.ToString()));
}

if (results.NextChunk == null)
break;

query.Uri = new Uri(results.NextChunk);
results = blogger.Query(query);
}
return comments;
}

/// <summary>
/// Main. Does all the work.
/// </summary>
/// <param name="args"> Command line parameters. </param>
static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: blogdespammer username password");
return;
}

string username = args[0];
string password = args[1];

Console.Write("Loading");

Service blogger = new Service("blogger", "BlogDeSpammer");
blogger.Credentials = new GDataCredentials(username, password);

ICollection<Blog> blogs = GetBlogs(blogger);
foreach (Blog blog in blogs)
{
blog.Posts = GetPosts(blogger, blog);
foreach (Post post in blog.Posts)
post.Comments = GetComments(blogger, post);
}

Console.WriteLine("\n\nType 'help' for help, 'exit' to quit.");
for (; ; )
{
Console.Write("> ");
string input = Console.ReadLine();
if (input.Equals("exit", StringComparison.CurrentCultureIgnoreCase))
break;

if (input.Equals("help", StringComparison.CurrentCultureIgnoreCase))
{
Console.WriteLine(
"Type:\n\texit\n\t\t-- to quit.\n\thelp\n\t\t--to get this text.\n");
Console.WriteLine(
"\tlist blogs\n\t\t-- to list available blogs.\n");
Console.WriteLine(
"\tlist posts in blog\n\t\t-- to list posts in a blog.\n");
Console.WriteLine(
"\tlist comments in blog\n\t\t-- to list comments in a blog.\n");
Console.WriteLine(
"\tlist comments in post\n\t\t-- to list comments in a post.\n");
Console.WriteLine(
"\tlist comments by author\n\t\t-- to list comments by " +
"a particular author.\n");
Console.WriteLine(
"\tlist comments like commenturi\n\t\t-- to list comments " +
"identical to a comment.\n");
Console.WriteLine(
"\tlist comments with text\n\t\t-- to list comments " +
"containing text segment.\n");
Console.WriteLine(
"\tdelete comments by author\n\t\t-- to delete all " +
"comments by a particular author.\n");
Console.WriteLine(
"\tdelete comments like commenturi\n\t\t-- to delete " +
"all comments that are the same as a " +
"comment with this URI.\n");
Console.WriteLine(
"\tdelete comments with text\n\t\t-- to list comments " +
"containing text segment.\n");
Console.WriteLine("Names above could be either text names or URIs.");
continue;
}

if (input.Equals("list blogs",
StringComparison.CurrentCultureIgnoreCase))
{
foreach (Blog blog in blogs)
Console.WriteLine("{0} ({1})", blog.Name, blog.Feed);
continue;
}

if (input.StartsWith("list posts in ",
StringComparison.CurrentCultureIgnoreCase))
{
string blogname = input.Substring(14);
foreach (Blog blog in blogs)
{
if (blog.Name.StartsWith(blogname) ||
blog.Feed.Equals(blogname))
{
foreach (Post post in blog.Posts)
Console.WriteLine("{0} ({1})", post.Title,
post.CommentsFeed);
}
}
continue;
}

if (input.StartsWith("list comments in ",
StringComparison.CurrentCultureIgnoreCase))
{
string name = input.Substring(17);
foreach (Blog blog in blogs)
{
if (blog.Name.StartsWith(name) || blog.Feed.Equals(name))
{
foreach (Post post in blog.Posts)
{
Console.WriteLine("{0} {1}", post.Title,
post.CommentsFeed);
foreach (Comment comment in post.Comments)
Console.WriteLine("-- {0} ({1})\n{2}\n",
comment.Author, comment.EditUri,
comment.Text);
}
break;
}

foreach (Post post in blog.Posts)
{
if (post.Title.StartsWith(name) ||
post.CommentsFeed.Equals(name))
{
Console.WriteLine("{0} {1}", post.Title,
post.CommentsFeed);
foreach (Comment comment in post.Comments)
Console.WriteLine("-- {0} ({1})\n{2}\n",
comment.Author, comment.EditUri,
comment.Text);
}
}
}
continue;
}

if (input.StartsWith("list comments by ",
StringComparison.CurrentCultureIgnoreCase))
{
string author = input.Substring(17);
foreach (Blog blog in blogs)
{
foreach (Post post in blog.Posts)
{
foreach (Comment comment in post.Comments)
{
if (comment.Author.Equals(author,
StringComparison.CurrentCultureIgnoreCase))
{
Console.WriteLine("{0} : {1}",
blog.Name, post.Title);
Console.WriteLine("-- {0} ({1})\n{2}\n",
comment.Author, comment.EditUri,
comment.Text);
}
}
}
}
continue;
}

if (input.StartsWith("list comments with ",
StringComparison.CurrentCultureIgnoreCase))
{
string text = input.Substring(19);
foreach (Blog blog in blogs)
{
foreach (Post post in blog.Posts)
{
foreach (Comment comment in post.Comments)
{
if (comment.Author.Contains(text) ||
comment.Text.Contains(text))
{
Console.WriteLine("{0} : {1}",
blog.Name, post.Title);
Console.WriteLine("-- {0} ({1})\n{2}\n",
comment.Author, comment.EditUri,
comment.Text);
}
}
}
}
continue;
}

if (input.StartsWith("list comments like ",
StringComparison.CurrentCultureIgnoreCase))
{
string prototypeUri = input.Substring(19);
Comment prototypeComment = null;

foreach (Blog blog in blogs)
{
foreach (Post post in blog.Posts)
{
foreach (Comment comment in post.Comments)
{
if (prototypeUri.Equals(comment.EditUri))
{
prototypeComment = comment;
goto found;
}
}
}
}
continue;

found:
foreach (Blog blog in blogs)
{
foreach (Post post in blog.Posts)
{
foreach (Comment comment in post.Comments)
{
if (comment.Author.Equals(prototypeComment.Author,
StringComparison.CurrentCultureIgnoreCase) &&
comment.Text.Equals(prototypeComment.Text))
{
Console.WriteLine("{0} : {1}", blog.Name, post.Title);
Console.WriteLine("-- {0} ({1})\n{2}\n",
comment.Author, comment.EditUri,
comment.Text);
}
}
}
}
continue;
}

if (input.StartsWith("delete comments by ",
StringComparison.CurrentCultureIgnoreCase))
{
string author = input.Substring(19);

foreach (Blog blog in blogs)
{
foreach (Post post in blog.Posts)
{
List<Comment> deleted = new List<Comment>();

foreach (Comment comment in post.Comments)
{
if (comment.Author.Equals(author,
StringComparison.CurrentCultureIgnoreCase))
{
Console.WriteLine("{0} : {1}", blog.Name, post.Title);
Console.WriteLine("-- {0} ({1})\n{2}\n",
comment.Author, comment.EditUri,
comment.Text);
blogger.Delete(new Uri(comment.EditUri));
deleted.Add(comment);
}
}

foreach (Comment comment in deleted)
post.Comments.Remove(comment);
}
}
continue;
}

if (input.StartsWith("delete comments like ",
StringComparison.CurrentCultureIgnoreCase))
{
string prototypeUri = input.Substring(21);
Comment prototypeComment = null;

foreach (Blog blog in blogs)
{
foreach (Post post in blog.Posts)
{
foreach (Comment comment in post.Comments)
{
if (prototypeUri.Equals(comment.EditUri))
{
prototypeComment = comment;
goto found;
}
}
}
}
continue;

found:
foreach (Blog blog in blogs)
{
foreach (Post post in blog.Posts)
{
List<Comment> deleted = new List<Comment>();

foreach (Comment comment in post.Comments)
{
if (comment.Author.Equals(prototypeComment.Author,
StringComparison.CurrentCultureIgnoreCase) &&
comment.Text.Equals(prototypeComment.Text))
{
blogger.Delete(new Uri(comment.EditUri));
deleted.Add(comment);
Console.WriteLine("{0} : {1}", blog.Name, post.Title);
Console.WriteLine("-- {0} ({1})\n{2}\n",
comment.Author, comment.EditUri,
comment.Text);
}
}
foreach (Comment comment in deleted)
post.Comments.Remove(comment);
}
}
continue;
}

if (input.StartsWith("delete comments with ",
StringComparison.CurrentCultureIgnoreCase))
{
string text = input.Substring(21);
foreach (Blog blog in blogs)
{
foreach (Post post in blog.Posts)
{
List<Comment> deleted = new List<Comment>();

foreach (Comment comment in post.Comments)
{
if (comment.Author.Contains(text) ||
comment.Text.Contains(text))
{
blogger.Delete(new Uri(comment.EditUri));
deleted.Add(comment);
Console.WriteLine("{0} : {1}", blog.Name, post.Title);
Console.WriteLine("-- {0} ({1})\n{2}\n",
comment.Author, comment.EditUri,
comment.Text);
}
}

foreach (Comment comment in deleted)
post.Comments.Remove(comment);
}
}
continue;
}

Console.WriteLine(
"Can't parse. Please use 'help' to look up valid commands.");
}


Console.WriteLine("Done!");
}
}
}

5 comments:

Илья Казначеев said...

(staring at the comment above)
FAIL!

BadTux said...

Easier way to de-gnome your blog: Turn on the 'mail comments to email address' settings in the Blogger dashboard, direct new comment postings to your inbox, and just zap the comments manually as they get posted. Problem solved.

Oh, also turn off the 'anonymous poster' options, that stops 95% of the comment spam right there, at the expense of requiring posters to get a free Blogger or OpenID account. But way I see it, my blog is my property and nobody has a right to comment anonymously on my blog, so (shrug).

Sergey Solyanik said...

The problem with manually administering the comments is that spammers post literally hundreds of them. I tried it at first, and quickly came to conclusion that it's easier to solve this programmatically.

As far as anonymous comments - it's annoying to get a blogger ID just to comment one. One more account, one more password to remember :-(. I think roughly half of the comments on this blog end up anonymous...

BadTux said...

So how's that de-gnoming going? :)

Anonymous said...

Pretty interesting site you've got here. Thanks for it. I like such themes and everything that is connected to this matter. I would like to read more soon.

Best regards
Darek Wish