What’s that picture?

In case you were wondering what the picture is in the current banner on this blog, that’s me on a hike in a semi-remote region of Western India (Maharashtra).  Outside of Mumbai, there is a hilltop community named Matheran that has many hiking trails with stunning views of the surrounding area.  Where I am standing it is a several hundred foot drop straight down to a rocky slope below (notice the lack of guard-rails)…

Advertisements

Non-Numeric Characters in SQL – Follow up

Although my previous post was a bit hasty, I wanted to follow up with some additional comments about the magic in the middle of the UDF created to remove any non-numeric characters from a column in a query result set.  As you can see there is a relatively cryptic body of the function that somewhat resembles a regular expression.  While the solution fit my needs, I have been thinking more about it and realize that the ramifications ripple much more broadly than just being able to strip non-numeric characters out of a string.  My thoughts led me to the following 3 important points to consider:

Important Point #1: LIKE is often underutilized

I have been foolishly using only simple % operations in 99% (blatant overuse of per cent character 🙂 ) of my queries using LIKE.  Most of the time this is plenty of horsepower, but for when you need to dig in a little bit more accurately, it helps to know that additional wildcard characters are supported by the LIKE operator.  In short summation, try a few queries using these additional search options:

  • _ (underscore)
  • [] (range)
  • [^] (not in range)

Important Point #2: PATINDEX() is rad

The real meat of this function is the PATINDEX statement which has the advantage of being able to use wildcards.  This is pointed out with just the right amount of detail in an article by Robert Davis: Pratical Uses of PatIndex.

Important Point #3: Reader beware

One underlying important point is that none of the approaches described here may perform particularly well when working with large volumes of data.  In general, string manipulation should be performed outside of the database.  If dealing with many rows of data, this approach could be pathetically slow.  However, if you are dealing with 1000 records or less you would have to do something pretty extreme to make a query perform unacceptably slow.

Hopefully you will find this useful the next time you are slamming together some ad-hoc SQL to dig through your data.  As mentioned above, this is not recommended for most production applications but can certainly be a handy tool to have in your SQL tool belt.

SQL Server String Manipulation – Removing Non-Numeric Characters

I recently had a need to get a distinct list of numbers from a freeform text field in a database.  After some analysis, my needs were met by the following approach:

CREATE FUNCTION dbo.UFNG_ONLY_DIGITS (@StrVal AS VARCHAR(max))
RETURNS VARCHAR(max)
AS
BEGIN
      WHILE PATINDEX(‘%[^0-9]%’, @StrVal) > 0
            SET @StrVal = REPLACE(@StrVal,
                SUBSTRING(@StrVal,PATINDEX(‘%[^0-9]%’, @StrVal),1),”)
      RETURN @StrVal
END

The function can then be used as follows:

SELECT dbo.UFNG_ONLY_DIGITS(‘8asdf7%87^A8876-*S’)

to return

87878876

Thanks to all who helped in my research with these posts:

http://groups.google.com/group/microsoft.public.sqlserver.programming/msg/032f65303d22e343

http://bytes.com/forum/thread81464.html