So, the other day I stumbled upon a discussion on StackOverflow about generating a url friendly slug. I found the problem quite interesting and decided to give it a go on solving this in .NET Core.
The problem
When creating pages we want the url to be readable for humans and SEO bots. And the problem was basically that you often have a text, title or string that is not url friendly. For this we often use some sort of slugifying function to turn the text into something that is both compatible with browser and readable to humans.
E.g. lets say we have the text ICH MUß EINEN CRÈME BRÛLÉE HABEN
(which is german for I must have a Crême Brûlée).
This text cant be used as a url since it has spaces, diacritics and international character. It's also not very SEO friendly.
The solution
Instead we would like something simpler to use in our url e.g. isch-muss-einen-creme-brulee-haben
.
To do this we need to:
- Normalize the text
- Remove all diacritics
- Replace international character
- Be able to shorten text to match SEO thresholds
I also had the criteria that it must be compatible with .NET 4.5+ and .NET Core
NOTE A lot of the solutions to the problem that was found in the thread worked just fine but had some "performance issues" when slugifying larger quantities and was not fully optimized. The only high performing code sample I found was this one. But it was a bit buggy so I tried a more modern version of it.
The code
UrlFriendly()
I wanted a function to generate the entire string and also to have an input for a possible max length, this was the result.
1public static class StringHelper2{3 /// <summary>4 /// Creates a URL And SEO friendly slug5 /// </summary>6 /// <param name="text">Text to slugify</param>7 /// <param name="maxLength">Max length of slug</param>8 /// <returns>URL and SEO friendly string</returns>9 public static string UrlFriendly(string text, int maxLength = 0)10 {11 // Return empty value if text is null12 if (text == null) return "";1314 var normalizedString = text15 // Make lowercase16 .ToLowerInvariant()17 // Normalize the text18 .Normalize(NormalizationForm.FormD);1920 var stringBuilder = new StringBuilder();21 var stringLength = normalizedString.Length;22 var prevdash = false;23 var trueLength = 0;2425 char c;2627 for (int i = 0; i < stringLength; i++)28 {29 c = normalizedString[i];3031 switch (CharUnicodeInfo.GetUnicodeCategory(c))32 {33 // Check if the character is a letter or a digit if the character is a34 // international character remap it to an ascii valid character35 case UnicodeCategory.LowercaseLetter:36 case UnicodeCategory.UppercaseLetter:37 case UnicodeCategory.DecimalDigitNumber:38 if (c < 128)39 stringBuilder.Append(c);40 else41 stringBuilder.Append(ConstHelper.RemapInternationalCharToAscii(c));4243 prevdash = false;44 trueLength = stringBuilder.Length;45 break;4647 // Check if the character is to be replaced by a hyphen but only if the last character wasn't48 case UnicodeCategory.SpaceSeparator:49 case UnicodeCategory.ConnectorPunctuation:50 case UnicodeCategory.DashPunctuation:51 case UnicodeCategory.OtherPunctuation:52 case UnicodeCategory.MathSymbol:53 if (!prevdash)54 {55 stringBuilder.Append('-');56 prevdash = true;57 trueLength = stringBuilder.Length;58 }59 break;60 }6162 // If we are at max length, stop parsing63 if (maxLength > 0 && trueLength >= maxLength)64 break;65 }6667 // Trim excess hyphens68 var result = stringBuilder.ToString().Trim('-');6970 // Remove any excess character to meet maxlength criteria71 return maxLength <= 0 || result.Length <= maxLength ? result : result.Substring(0, maxLength);72 }73}
RemapInternationalCharToAscii()
This helper is used for remapping some international characters to a readable one instead.
1public static class ConstHelper2{3 /// <summary>4 /// Remaps international characters to ascii compatible ones5 /// based of: https://meta.stackexchange.com/questions/7435/non-us-ascii-characters-dropped-from-full-profile-url/7696#76966 /// </summary>7 /// <param name="c">Charcter to remap</param>8 /// <returns>Remapped character</returns>9 public static string RemapInternationalCharToAscii(char c)10 {11 string s = c.ToString().ToLowerInvariant();12 if ("àåáâäãåą".Contains(s))13 {14 return "a";15 }16 else if ("èéêëę".Contains(s))17 {18 return "e";19 }20 else if ("ìíîïı".Contains(s))21 {22 return "i";23 }24 else if ("òóôõöøőð".Contains(s))25 {26 return "o";27 }28 else if ("ùúûüŭů".Contains(s))29 {30 return "u";31 }32 else if ("çćčĉ".Contains(s))33 {34 return "c";35 }36 else if ("żźž".Contains(s))37 {38 return "z";39 }40 else if ("śşšŝ".Contains(s))41 {42 return "s";43 }44 else if ("ñń".Contains(s))45 {46 return "n";47 }48 else if ("ýÿ".Contains(s))49 {50 return "y";51 }52 else if ("ğĝ".Contains(s))53 {54 return "g";55 }56 else if (c == 'ř')57 {58 return "r";59 }60 else if (c == 'ł')61 {62 return "l";63 }64 else if (c == 'đ')65 {66 return "d";67 }68 else if (c == 'ß')69 {70 return "ss";71 }72 else if (c == 'þ')73 {74 return "th";75 }76 else if (c == 'ĥ')77 {78 return "h";79 }80 else if (c == 'ĵ')81 {82 return "j";83 }84 else85 {86 return "";87 }88 }89}
Result
To the function would work something like this
1const string text = "ICH MUß EINIGE CRÈME BRÛLÉE HABEN";2Console.WriteLine(StringHelper.URLFriendly(text));3// Output:4// ich-muss-einige-creme-brulee-haben
The code seems to be working and you can find the entire source code here on GitHub with some samples.