In the world of web development using .NET, it’s crucial to make sure the web addresses (URLs) our application uses are correct and safe. In this article, we will understand the basics of how to check if a URL is valid in C#. We’ll include some easy-to-understand code examples to help us implement these validations in our projects.
So let’s dive in.
Basics of URL Structure
URL, or Uniform Resource Locator, serves as an address for web resources. It’s a string whose basic structure consists of several components. They include the scheme (like “http” or “https“), the domain or hostname (such as “www.example.com”), and the path to the specific resource (e.g. “/v1/products”). Optional components may include query parameters and a fragment.
For instance, let’s analyze the structure of the URL https://www.example.com/path/page?query=value#section:
Component | Value |
---|---|
Scheme | https |
Domain | www.example.com |
Path | /path/page |
Query parameters | query=value |
Fragment | section |
Understanding these components is fundamental to effective URL validation and navigation in web development.
Check if the URL Is Valid Using Regular Expressions (Regex)
One effective method for URL validation in C# involves leveraging the built-in Regex
class (regular expression). Using Regex, we can define patterns that URLs should adhere to. This allows us to perform flexible and customizable validation.
Let’s define a UrlValidator
class with a single static method. It will use a Regex
which validates web page URLs:
public static class UrlValidator { public static bool ValidateUrlWithRegex(string url) { var urlRegex = new Regex( @"^(https?|ftps?):\/\/(?:[a-zA-Z0-9]" + @"(?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}" + @"(?::(?:0|[1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}" + @"|65[0-4]\d{2}|655[0-2]\d|6553[0-5]))?" + @"(?:\/(?:[-a-zA-Z0-9@%_\+.~#?&=]+\/?)*)?$", RegexOptions.IgnoreCase); urlRegex.Matches(url); return urlRegex.IsMatch(url); } }
Here, we validate a URL according to the standard conventions. Let’s break this pattern into smaller parts and take a closer look:
Pattern | Explanation |
---|---|
(https?|ftps?):\/\/ | This expression checks for the scheme (either HTTP(s) or FTP(s)) followed by "://". |
(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,} | Checks for appropriate domain names, including subdomains. |
(?::(?:0|[1-9]\d{0,3}|[1-5]\d{4}|6[0-4]\d{3}|65[0-4]\d{2}|655[0-2]\d|6553[0-5]))? | Expression for an optional port number between 1 and 65535. |
(?:\/(?:[-a-zA-Z0-9@:%_\+.~#?&=]+\/?)*)? | This pattern checks for correct URL path structure (along with optional query parameters and a fragment) with valid characters and optional trailing slashes. |
Overall, these parts ensure that the URL adheres to standard formatting rules. Now, let’s see the validation in action:
var url = "https://www.amazon.com"; var success = UrlValidator.ValidateUrlWithRegex(url); Console.WriteLine($"The URL '{url}' is {(success ? "valid" : "invalid")}."); var url2 = "ftp:////example.com///one?param=true"; success = UrlValidator.ValidateUrlWithRegex(url2); Console.WriteLine($"The URL '{url2}' is {(success ? "valid" : "invalid")}.");
Here, we test the ValidateUrlWithRegex()
method with 2 input URLs, outputting our result to the console.
Let’s check the console output:
The URL 'https://www.amazon.com' is valid. The URL 'ftp:////example.com///one?param=true' is invalid.
Here, we see that our Regex validation correctly interprets the first URL as valid and the second one as an invalid one since it contains excess slash (‘/’) characters.
Check if the URL Is Valid Using the Built-in URI Class
The built-in Uri
class is another option in .NET that provides us with a more straightforward approach to URL validation. We’re going to analyze two ways we can use it to validate URLs.
Using Uri.TryCreate
The first method we are going to look at is the Uri.TryCreate()
. Its simplicity and ease of use make it a good choice for most use cases.
However, we should note that while Uri
may accept some URLs as valid, but they will still be technically incorrect according to the URI specifications. Thus, they may behave unexpectedly in certain scenarios. This method has a more relaxed validation compared to regular expressions, so we may need additional validation steps for specific use cases.
Let’s define another validation method ValidateUrlWithUriCreate()
in our validator class:
public static bool ValidateUrlWithUriCreate(string url, out Uri? uri) { var success = Uri.TryCreate(url, UriKind.RelativeOrAbsolute, out uri); return success; }
Here, we pass in the URL we want to validate as the first parameter to Uri.TryCreate()
method. Then we specify that we accept either a Relative or an Absolute URL. The method will return true
if the URL is valid and the result URI
object will be stored in the variable uri
.
Now, let’s see it in action:
url = "https://api.facebook.com:443"; success = UrlValidator.ValidateUrlWithUriCreate(url, out _); Console.WriteLine($"The URL '{url}' is {(success ? "valid" : "invalid")}."); url2 = "ftp:///api.site.com?value=word1 word2"; success = UrlValidator.ValidateUrlWithUriCreate(url2, out _); Console.WriteLine($"The URL '{url2}' is {(success ? "valid" : "invalid")}.");
Similar to our previous method, we test the ValidateUrlWithUriCreate()
method with another 2 URLs.
Now, let’s check the console output:
The URL 'https://api.facebook.com:443' is valid. The URL 'ftp:///api.site.com?value=word1 word2' is invalid.
Again, our method correctly interprets the first URL as valid and the second as an invalid one.
Using Uri.IsWellFormedUriString
Apart from the TryCreate()
method, the Uri
class gives us another mechanism for stricter validation – namely the Uri.IsWellFormedUriString()
method.
The Uri.IsWellFormedUriString()
method makes sure that the string is a well-formed URL following the RFC 3986 and RFC 3987 specifications for URI syntax. By using it, we can determine if a string is a valid URL by attempting to construct one. It also ensures that the string does not require any further character escaping.
First, let’s define a ValidateUrlWithUriWellFormedString()
method in our UrlValidator
class:
public static bool ValidateUrlWithUriWellFormedString(string url) { var success = Uri.IsWellFormedUriString(url, UriKind.RelativeOrAbsolute); return success; }
Here, we simply call the method and specify that we accept either a Relative or an Absolute URL.
Next, we can use it to validate 2 URLs:
url = "https://site.company?q=search"; success = UrlValidator.ValidateUrlWithUriWellFormedString(url); Console.WriteLine($"The URL '{url}' is {(success ? "valid" : "invalid")}."); url2 = "ftp://api.site.com?value=word1 word2"; success = UrlValidator.ValidateUrlWithUriWellFormedString(url2); Console.WriteLine($"The URL '{url2}' is {(success ? "valid" : "invalid")}.");
Again, we run validations on 2 input URLs, and can then inspect the console:
The URL 'https://site.company?q=search' is valid. The URL 'ftp://api.site.com?value=word1 word2' is invalid.
Here, we see that the first one is valid according to web standards. However, the second one is considered incorrect as its query string is improperly escaped. The white space between the words should be replaced by either “%20” or a “+”.
Check if URL Is Valid With HTTP Request
Another method we can validate URLs is by sending an HTTP request which checks the server’s response status. This way we can ensure the existence and accessibility/availability of the specified URL.
Drawbacks and Security Risks
While it provides real-time validation, drawbacks include the dependency on network connectivity and potential performance overhead due to the need for an actual request. Additionally, it may not cover cases where the server allows requests but the resource does not exist.
When making network calls to foreign domains or URLs, it’s important to consider potential security risks. These include cross-origin resource sharing (CORS) issues, the trustworthiness of external domains, and the potential for malicious content or data privacy concerns. To mitigate these risks, we can implement proper security measures such as using HTTPS, content security policies (CSP), and/or input validation.
Implement Sending an HTTP request to Check if URL Is Valid
Let’s now observe how we can use this strategy to validate URLs:
public static async Task<bool> ValidateUrlWithHttpClient(string url) { using var client = new HttpClient(); try { var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Head, url)); return response.IsSuccessStatusCode; } catch (HttpRequestException e) when (e.InnerException is SocketException { SocketErrorCode: SocketError.HostNotFound }) { return false; } catch (HttpRequestException e) when (e.StatusCode.HasValue && (int)e.StatusCode.Value > 500) { return true; } }
Here, we use the .NET’s built-in HttpClient to send HTTP requests to the targetted URLs. Note that we specify the HTTP HEAD method, as we’re only interested in the remote server returning us an OK status code, indicating that the requested resource/URL has been found.
In the case of a failure (meaning that the URL has not been found by DNS), we expect the HTTP call to throw an HttpRequestException
. Furthermore, this exception wraps an inner one of the type SocketException
that has its SocketErrorCode
property set to HostNotFound
. This indicates that DNS hasn’t been able to resolve this hostname.
Here it is important to note that the requested resource might be temporarily unavailable (e.g. returns a status code of 5XX), in which case we still consider the URL as valid.
Next, let’s see this validation in action:
url = "https://api.facebook.com"; success = await UrlValidator.ValidateUrlWithHttpClient(url); Console.WriteLine($"The URL '{url}' is {(success ? "valid" : "invalid")}."); url2 = "https://www.example-nonexistent-url.com"; success = await UrlValidator.ValidateUrlWithHttpClient(url2); Console.WriteLine($"The URL '{url2}' is {(success ? "valid" : "invalid")}.");
This time, we use our ValidateWithHttpClient()
method to check the URL validation.
We can once again check the console output:
The URL 'https://api.facebook.com' is valid. The URL 'https://www.example-nonexistent-url.com' is invalid.
Our console results indicate that the first URL was successfully accessed, whilst the second one hasn’t been resolved by DNS, hence we consider it invalid.
Benchmark URL Validation Methods
Ultimately, the method we choose depends on our application’s validation and performance requirements. Let’s now compare the methods we discussed by running some performance benchmarks with BenchmarkDotNet. We are going to test the validations against the URL https://site.company?q=search.
With that, let’s assess the results:
Method | Mean | Error | StdDev | Allocated | --------------------------------------- |------------------:|------------------:|------------------:|----------:| UriCreateValidationBenchmark | 85.49 ns | 1.024 ns | 0.908 ns | 56 B | UriWellFormedStringValidationBenchmark | 195.51 ns | 3.702 ns | 3.281 ns | 136 B | RegexUrlValidationBenchmark | 20,660.69 ns | 401.489 ns | 763.874 ns | 23256 B | HttpClientValidationBenchmark | 487,029,002.56 ns | 16,963,239.617 ns | 43,787,538.241 ns | 67560 B |
From our results, we learn that the Uri.TryCreate()
method happens to be the fastest method for URL validation, making it ideal for quick and efficient validation of basic URLs. Next comes the Uri.IsWellFormedString()
method, which runs around 2.5 times slower. Regex validation comes in 3rd place and, surprisingly, takes significantly more time. Finally, validation using HTTP calls is shown to be the slowest due to the network communication.
Best Practices for URL Validation and Comparison Between Different Methods
Adhering to best practices for URL validation in .NET involves combining multiple validation methods, such as Regex
patterns and the Uri
class, to create a robust validation strategy. It’s essential to balance strict validation and practical flexibility based on our application’s needs.
For basic validation needs where accuracy and reliability are needed, Uri.TryCreate()
is a suitable choice. It provides comprehensive parsing and validation capabilities. Also, when we want quick and lightweight validation, especially in scenarios where performance is crucial, Uri.IsWellFormedUriString()
can be sufficient.
If we need detailed control over the validation pattern or have specific requirements not covered by built-in methods, then regular expression offers us the most flexibility. However, they require careful crafting and testing.
Lastly, making HTTP calls for URL validation is appropriate when the URL’s availability and content are critical. Nonetheless, we should use them carefully due to their resource-intensive nature.
Conclusion
In this article, we’ve explored diverse and effective methods for validating URLs in C#. By utilizing Regex patterns, the built-in URI class, and even real-time checks with HTTP requests, we now have a toolbox of techniques to ensure the accuracy and security of URLs in our applications.
Remember, choosing the right method depends on our specific requirements, and combining them allows us to create a robust URL validation strategy tailored to our projects’ needs.