Bulk Delete

Maybe it looks like I am having a personal vendetta against OR mappers but this is not the case. In the last week I have seen a otherwise “good” applications performance go overboard because of a misuse of an OR mapper. So I just want to share my findings and maybe save someone, somewhere, some time.

My poison today was the process of deleting lot of records from a simple table. A simple task if you care to get your fingers dirty in SQL, regardless of the flavor.

Let’s say that you have a table of users and you developed a strong hate for the letter ‘H’ (or any other letter in the alphabet). So you woke up with a burning hate of the letter ‘H’ and you can not but delete all users which name starts with the letter ‘H’.

All you need to do is to execute this simple query:

DELETE FROM Users
WHERE UserName LIKE 'H%'

And done…

I agree that this scenario is a little…simple, but even if the structure of the database schema would be a little more complicated you could still do it with very few queries.

Now lets take a look at how this happens with a OR mapper.

With an OR mapper we first have to get a hold of the users we want to delete. The easiest (and most common) way to do this is to load them into memory. That meas that a sql query will be fired to read all users that we are going to delete.

SELECT * FROM Users WHERE UserName LIKE 'H%'

Note: Please take into consideration that this query does not happen if the users are already loaded into memory.

So now you have them in memory. Now you can call Delete on every one of them. Te default behavior of any OR mapper is to fire of individual delete statement, one for each user you want to delete.

DELETE FROM Users WHERE UserId = 1
DELETE FROM Users WHERE UserId = 2
...
DELETE FROM Users WHERE UserId = 5000

There is no arguing that this is less efficient that doing all of it in one big delete statement. The situation gets even worse if there are triggers or a cascade delete 😦

But before you start fainting stop think. I found a solution, sadly only for Entity Framework. For the rest I just bypass the OR mapper and use the underlying connection to directly fire SQL statements (how this is done is specific with the OR mapper used).

This breaks the “database agnostic” advantage that OR mapper give you. But to be honest   I think that this advantage is a myth, so I lose nothing here. And besides in all the SQL flavors the vanilla delete statement is the same.

So this is another problem I have with OR mappers and my solution to the same problem. Please keep in mind that this becomes a problem with massive data quantities. What “massive” means for your application is a question that can only be individually answered.

But before I go I want to share my latest story. There was an operation in an application I work on that fired 80k to 100k requests of those 80% to 90% were delete statement. Before I started the operation took 50 minutes and would often fail because of timeouts. After the “optimization” the same operation working with the same data finished in 4 to 6 seconds.

User.Deleted

Interesting title, is it not?

But let me explain. I have noticed, lately, that people have stopped using their brains and have given their data storage over to heuristics. What I mean with that? I mean that more and more people are using their OR mapper to generate their DB schema. And I think that this is not the greatest idea in the world. Maybe this is just me but I hope not.

Point in case is the title.

In the databases “Users” table there may be a column that is labeled Deleted. What this column is doing is telling you that the users is actually deleted. You might be asking yourself that if the user is deleted then why just not delete the user record. Well there are many reasons but the top one is data-integrity. There are ways to work around this but if not pressed by the law we tend to not do it.
For those still in the dark here is a quick recap of why this is done. When the user entry is created we tend to bind other data in the database to that user. When the user then decides to leave our system we tend to keep some of the data he has provided or data that accumulated as a result of him using the system. To keep the database integrity existent we eighter have to reassign all the data to another “bogus” user or just mark the user as nonexistent. We obviously go for the later 🙂

So the Users.Deleted makes sense in the context of a database but how about in the context of your domain model? In the data domain this makes little sense because if the user is deleted (non-existed for the application) the user should not be loaded into the data domain.

This is just one small example where the data domain and the database schema are not equal. There are countless others and their number will increase with the complexity of the application.

ValyJs Why?

New job, new application,new ideas!

After getting the hang of the application I am taking care of I star noticing the same code repeating. And after taking a look at some other web pages I see the same things repeating.

As a fair warning we are talking about javascript here so the work code is and will be used in its widest definition.

function foo(){
  var element = $("elementId");
  if(element != null){
    //do something
  }
}

or the classic one where an anonymous object is passed to a function and they check for each and every member of the object…

function bar(user){
  if(user.name){
    // use the users name
  }
  // dosomething else
  if(user.surname){
    // now it's safe to use the surname
  }
}

And now imagine that a million times over all the files that compose the client site part of the application.

Yes that makes Dejan quite unhappy 😦

So this is where ValyJs comes into play. I would not call it a complete product but I wrote up some helper functions that can aid those common scenarios.

Now you can write it like this.

function foo(){
  valy.executeIfExists({
    elementId : "elementId",
    func : function(element){
      // do your stuff here
    }
  });
}

 

Is this better? I think that it is more readable if nothing else. If nothing else it is consistent.

For the second scenario. We can check for all members up front.

function bar(user){
  valy.exists(user.name);
  valy.exists(user.surname);

  // do your stuff
} 

I to hope that this is better.

There is still a long way to go till I get this to version 1, currently I have it at version 0.1.1 and it is working.

What I need now is some creative input to get something that is actually usable.

So anyway you can go and check it out here: http://code.google.com/p/valjjs/

 

Day 1 – Pex and Moles

Today was the first day of  my .NET 4.0 training. And I would like to share the highlights of each day. Well at least the highlights for me. So this is the first day of 5.

Introductions and first days are always slow. So this was no exception. There were some basic introduction things for people who have newer came into contact with electricity. But a good 5 hours in something was presented that caught my eye: Pex and Moles.

Pex and Moles are actually two separate peaces of software:

  • Pex automatically generates test suites with high code coverage.
  • Moles allows to replace any .NET method with a delegate.

I am still unsure about the real day-to-day value that moles will offer me. This is not because the software is not up to pair with what I would use but that I am mostly working on code I can change and refactor so that the need for such a tool is not needed. But more on that later on.

First you will need a copy of Pex and Moles. So here is the download link. You can even find a version for the express version (non-commercial). After you have the file just let the installer do its work an tolerate the 2 ~ 3 times your focus will be stolen (it is worth it).

First you need some code to let pex have fun with. I just quickly wrote a little class with one method. And here it is:

namespace PexAndMoles
{
    public class Calculator
    {
        public int Add(int one, int two)
        {
            if(one == 0 || two == 0)
                throw new ArgumentException();

            if(one < two)
                throw new ArgumentException();

            return one + two;
        }
    }
}

There is a reason for all those ifs in there. It is for the sole reason to give pex something to work on 🙂

So to get started just left-click on the method you want to “work on”. You should see something like this.

Run PEX

Pex will ask you which testing framework it should use. You can choose from all the major testing framework. But to keep it simple I chose to stick with MSUnit.

Select testing framework

After a short time where you are tempted by a “follow us on Facebook” link the results are presented and if you are lucky (depending on the code complexity) pex will find all major test scenarios for your method.

In my case this is what it came up with.

Pex results

And those are all the test scenarios I wanted (or even expected).

Now that you have your tests you want to keep them for later (most probably some sort of regression testing). So pex can help you there to. If you select all created “results” a “Promote…” button will appear. If pressed it adds the “results” as unit tests into your testing project or creates a new one and adds them there.

The code generated is confusing at worst and funny at best. It is not the go-to example of good/clean code. But it is auto-generated and can be regenerated if future changes break the tests. The naming convention is “acceptable”. Before I rant too much here is the code generated:

namespace PexAndMoles
{
    [TestClass]
    [PexClass(typeof(Calculator))]
    [PexAllowedExceptionFromTypeUnderTest(typeof(ArgumentException), AcceptExceptionSubtypes = true)]
    [PexAllowedExceptionFromTypeUnderTest(typeof(InvalidOperationException))]
    public partial class CalculatorTest
    {
        [PexMethod]
        public int Add(
            [PexAssumeUnderTest]Calculator target,
            int one,
            int two
        )
        {
            int result = target.Add(one, two);
            return result;
            // TODO: add assertions to method CalculatorTest.Add(Calculator, Int32, Int32)
        }
        [TestMethod]
        [ExpectedException(typeof(ArgumentException))]
        public void AddThrowsArgumentException547()
        {
            int i;
            Calculator s0 = new Calculator();
            i = this.Add(s0, 0, 0);
        }
        [TestMethod]
        [ExpectedException(typeof(ArgumentException))]
        public void AddThrowsArgumentException81()
        {
            int i;
            Calculator s0 = new Calculator();
            i = this.Add(s0, 1, 0);
        }
        [TestMethod]
        public void Add520()
        {
            int i;
            Calculator s0 = new Calculator();
            i = this.Add(s0, 1, 1);
            Assert.AreEqual<int>(2, i);
            Assert.IsNotNull((object)s0);
        }
        [TestMethod]
        [ExpectedException(typeof(ArgumentException))]
        public void AddThrowsArgumentException470()
        {
            int i;
            Calculator s0 = new Calculator();
            i = this.Add(s0, 2, 3);
        }
    }
}

And that is pex in a nutshell. At least that is what I was able to find out about it in the half day I spend with it.

I know that Moles was not mentioned here but I would like to spend some more time with it before writing about it in more detail, besides the post is long enough.

And that is all the time I have today.

Thx for your time.

I love nHibernate but…

There are just to many assemblies to include to get it running. In my little fantasy idealistic world the main features of nHibernate would be contained in one assembly that would not clog up the references if the project and make it dependent upon to many things.

Before you will burn me on the stake for taking a swing on your favorite OR-mapper hear me out. In the old days where people/software were happy if they could replace their 50+ lines of ADO.NET code with 5 XML files nHibernate was the bomb. All that you needed was one assembly and you were done. Today the situation is somewhat different. I personally expect an OR mapper to deliver:

  • Basic OR mapping functionality
  • Fluent configuration
  • A Linq provider
The situation now is that each of this is a separate assembly. And I do not like this. This means that I must have three assemblies where only one should be, making my projects “bloated” and my bin folder looking like a dump.
I do not see a reason why this can not be included in the main distribution of nHibernate and my hopes are high that this will be the case in the near future.

Log4Net custom LayoutPattern

A few days ago I had a nice little task at my workplace: To add a custom “key” to the log4net pattern layout.

I like log4net and was quite happy to be given time to dive into the belly of the beast. But I quickly noticed that most of the information on the web is outdated and simply wrong with the current version.

The most useful information I got was from a blog post from Scot Hanselman http://www.hanselman.com/blog/CreatingYourOwnCustomPatternLayoutPatternParserAndPatternConvertorWithLog4net.aspx. Sadly log4net has moved along since that blog was written.

Where I failed in the current version was the fact that the PatternParser class is sealed thus making any inheritance attempts void.

After some Googleing and reading the log4net SDK I found the solution to be simple and elegant at the same time. All you have to do is to create two classes:

  • PatternConverter
  • PatternLayout

And you are done.

The first class you have to implement is a PatternConverter which will actually handle the new key you want to add. The implementation is straight forward and easy to do. An example would look like this:

using System.IO;
using log4net.Util;
namespace MyCustomLog4NetPattern
{
  public class CustomPatternConverter : PatternConverter
  {
    protected override void Convert(TextWriter writer, object state)
    {
      writer.Write("Message to add");
    }
  }
}

The object that gets passed in is actually the LoggingEvent that has triggered the log operation. Sou if you need additional resources from there they are available. Additionally to this you can access all the environment variables that you thing should be placed in the log.

A little word of advice: Do not apply any formatting to the string you write. So no newlines or any other fancy formating.

After that is implemented we can move on to the PatternLayout. This is again an almost empty inheritance story. Which will look like this:

using log4net.Layout;
namespace MyCustomLog4NetPattern
{
  public class CustomPatternLayout : PatternLayout
  {
    public CustomPatternLayout()
    {
      AddConverter(new ConverterInfo{Name = "your_key", Type = typeof(CustomPatternConverter )});
    }
  }
}

And that is it! You are done. Every time that the string %your_key appears in the appender layout your little converter will be called and the placeholder replaced.

If now you want to use your custom “key” on a rolling file appender the configuration would look like this:

<appender name="RollingLogFileAppender" type="log4net.Appender.RollingFileAppender">
  <file value="Log/log.txt" />
  <appendToFile value="true" />
  <rollingStyle value="Composite" />
  <datePattern value="yyyyMMdd" />
  <maxSizeRollBackups value="10" />
  <maximumFileSize value="1MB" />
  <layout type="MyCustomLog4NetPattern.CustomPatternLayout, MyCustomLog4NetPattern">
    <conversionPattern value="%date [%thread] %-5level %logger - %message%newline - My stuff: %your_key%newline" />
  </layout>
</appender>

Ant that is all that there is to this. easy and simple solution.

Hope that this will save somebody some time in the future 🙂

Localization! If I tell you English then give me English

I know that the next logical step in web pages is to deliver more and more localized content. But i think that some companies and developers are taking to many shortcuts to achieve this! The most aggravating ones is defining the users preferred language based on his location! This I just wrong! And in this post I will explain you why I think that this is wrong and how to do it correctly!

Before we start a little word of explanation: I prefer my web sites in English (regardless of my mother tongue) but sadly my IP address belongs to an Austria ISP. So naturally all “geo-aware” websites know where my computer sits. OK, this is not bad but what drives me mad is what they do with that information.

Now that that is done lets start. So I fire up my browser and navigate to a geo-aware website, like IGN.com. so what do I see on my screen when the page has loaded? I see:

  • Local news…OK
  • Local release dates…OK
  • The adds are also local…OK
  • Everything on the freking page is in German…NOT OK

I am quite happy with the location related information but does my location really tell anything about my language preferences? The simple answer is NO IT DOES NOT! But how should the web page know your language preferences if it does not know you? Someone could ask at this time. The answer to this is so simple that 80% of web pages tend to ignore it:

The web page does not but my browser knows!

Each browser on the face of this earth send the desired language preferences to the server when making an request! Why this simple fact is ignored i do not understand! Here is the screen that proves it!

Language settings dialog in Firefox

Firefox language settings

So this “preference” gets send to the web server with every request I do! Take this and give me the page in the language I would like to see! I do not care about the adds and the offers but the actual page content could be displayed in the language I desire (If it exists). I do not expect this to be true for small local sites that are only published with one language. But for big multilingual sites that have all the content in English to show it in my location language just because my IP says so is just mocking the user!

If the content that I want is there then please give it to me!

Making it the right way is not that hard! I will take .NET with C# to make my point, but it is easily possible in all other languages.

If you are in a web application you can simply query the HttpContext for the data you need:

string[] languages = HttpContext.Current.Request.UserLanguages;

And that is it! Now you have the names of the users preferred cultures (this includes the language he wants). Just pick the first one and use that to resolve all your multi language resources.

Is that really that hard?