Python3 regular expression to fetch Domains from a string or Paragraph.

To generate a regular expression that can extract valid domain names without “www” from a paragraph, you can use the following pattern:

(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]

This regular expression uses the following elements:

(?:...) is a non-capturing group, which allows you to group parts of the pattern without creating a capturing group.
[a-z0-9] matches any lowercase letter or digit.
[a-z0-9-]{0,61} matches any lowercase letter, digit, or hyphen, occurring 0 to 61 times.
+ is a quantifier that matches the preceding element one or more times.
\. matches a literal dot.

This regular expression will match domain names that consist of one or more subdomains, followed by a top-level domain (TLD). For example, it will match “example.com”, “subdomain.example.com”, and “sub.subdomain.example.com”, but it will not match “www.example.com“.

To use this regular expression to extract domain names from a paragraph, you can use a regular expression library in your programming language of choice. For example, in Python you can use the re module:

import re

pattern = r"(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?\.)+[a-z0-9][a-z0-9-]{0,61}[a-z0-9]"

text = "The website is located at example.com. Please visit us at subdomain.example.com."

matches = re.findall(pattern, text)

print(matches)  # Output: ["example.com", "subdomain.example.com"]

I hope this helps! Let me know if you have any questions.

Related Posts

Leave a Reply Cancel reply