Extending Linq - Duplicates

Linq is so easy to extend. But the right way to do it is to use and rely on existing infrastructure. Let's take as an example Duplicate extension that return all duplicates from a collection.

One example is here. For convenience I'm copying the code here.

   1: public static HashSet<T> Duplicates<T>(this IEnumerable<T> source)
   2: {
   3:   HashSet<T> items = new HashSet<T>();
   4:   HashSet<T> duplicates = new HashSet<T>();
   5:   foreach (T item in source)
   6:   {
   7:     if (!items.Add(item))
   8:         duplicates.Add(item);
   9:   }
  10:   return duplicates;
  11: }

Yes, this code works but only in-memory. It will not work in the Database (it will not generate SQL, but will be evaluated in the memory instead).

Let's compare this solution to the one below:

   1: public static IQueryable<TSource> Duplicates<TSource>(this IEnumerable<TSource> source) where TSource : IComparable {
   2:     if (source == null)
   3:         throw new ArgumentNullException("source");
   4:     return source.Where(x => source.Count(y=>y.Equals(x)) > 1)
   5:         .AsQueryable<TSource>();
   6: }

This is much easier to implement (actually one line of code) and is based on existing Linq infrastructure. I believe it might be optimised, but you get the points.

  1. Don't hurry to invent the wheel.
  2. Use standard Linq convention with IQueryable interface whenever possible.