Sunday, April 19, 2009

Regex - Reformat a string by regular expressions

Sometimes we have strings with different formats but same meaning, then may be it's better to keep them all with an standard format in our report to keep consistency and make it easier for users.

One example is format of a phone number, the standard format can be considered as "(###) ###-####" and here we have a solution by regular expression even though that many different regular expressions would work.

Match m = Regex.Match(str,
@"^\(?(\d{3})\)?[\s\-]?(\d{3})\-?(\d{4})$" );
string newStr = String.Format("({0}) {1}-{2}",
m.Groups[1], m.Groups[2], m.Groups[3] );


In above regular expression pattern, each \d{n} part is surrounded by parenthesis which makes that part as a separate group (then each of items exists in Match.Groups array) that can be easily used using String.Format, another reformatting method is Regex.Replace, it's a static method and in the following example it replace dates in mm/dd/yy format to dd-mm-yy format:

string newStr = Regex.Replace(str,
@"\b(?\d{1,2})/(?\d{1,2})/(?\d{2,4})\b",
"${day}-${month}-${year}" );

In above pattern, ${day} inserts the substring captured by the group (?\d{1,2}) and so on.

Note: \b in above pattern specifies that the match must occur on a boundry between \w (alphanumeric) and \W (nonalphanumeric) characters. It means a word boundary, which are the first and last characters in words separated by any nonalphanumeric characters.
Share/Bookmark

No comments: