Jump to content
Search In
  • More options...
Find results that contain...
Find results in...

Regex Url matching Contest


Marsh
 Share

Recommended Posts

Help the owners fix the url matching code! Write your own regex!! Winner will receive bragging rights for life. There is a template for urls in my regexr link.

Contest ends on Monday the 6th.

Scoring:
+ 1 point for each matched url
- 1 point for each matched url that should not be matched (see template)
If tie, shortest regex.

Multiple submissions are allowed.  Any copy pastes from online will be removed from the contest.

Go!

* * *

First submission to get the ball rolling

1st
```
(http://|www\.)[^ \t\n]+
```
2nd
```
(http://|www\.|https://|ftp:)[^ \t\n]+
```

[Template and my submission link. Click Me!](http://regexr.com/3b9u8)

**Rankings:**
1st. PandaCoder - 36 points.
2nd. TehDoug- 24 points.
3rd. Lemon- 24 points.
4th. Marsh - 11 points.
5th. creatorfromhell - 5 points.
Link to comment
Share on other sites

Well, I give up trying to get my last rule in place; it was for if you have a slash immediately after a dot. Whatever, it's perfect otherwise:

```
text.replace(/((https?|ftp):\/\/(([^.\/\s#?]+?)(?=\.)([^\s]+)))/g, '[$3]($1)')
```
http://regexr.com/3b9ta

Longer than Marsh's, but it covers more rules; less matches made on invalid URLs

Not sure what my score is, since I can't figure out how Marsh got his score from his own scoring system…
Link to comment
Share on other sites

Well, my regex started off looking nice….it has become unsightly, and I don't even expect to win so x3.
```
((((https?|ftps?)+:\/\/)+(www.)?)+(\.(?!www\.)|[-a-zA-Z0-9_@~:/?#\[\]@!$&'()*+;=]|[^\x00-\xFF]|[^\u0000-\u0080]|[\u0600-\u06FF])+((\.+(com|net|ws|de|mp|bar|[^\x00-\xFF]|[^\u0000-\u0080]|[\u0600-\u06FF]))|:[0-9]|\.[0-9]{3}|\.[0-9]{2})+((\/(?!\.))+([-a-zA-Z0-9_@~:/?#\[\]@!$&'()*+;=]|[\x00-\xFF]|[\u0000-\u0080]|[\u0600-\u06FF]))?)

```I didn't add a full list of TLDs to the example, because the here's the full list… http://data.iana.org/TLD/tlds-alpha-by-domain.txt
http://regexr.com/3ba1q
Link to comment
Share on other sites

So I went and created a testing system for our little pet project that outputs if you've passed test cases or failed them, which you can find [here](http://live.pandacoder.info/scripts/urlregextest.html).
(if you didn't click "here": [http://live.pandacoder.info/scripts/urlregextest.html](http://live.pandacoder.info/scripts/urlregextest.html))

**Something to note**: RegExr doesn't do the job properly, not sure why (though it comes close, but in my tester using the regular RegExp JavaScript object the Arabic URL is properly handled, as for some reason RegExr cuts off because of the \n character, which is part of \s).

**Score**:
- if you were supposed to match it and you did, -1 point
- if you were not supposed to match it and you did, -1 point
- if it was a custom test that I added it does not affect your score (these are marked [OPTIONAL], and should be located at the bottom of all the test cases)

**Clarifications I made**:
One thing I did do is say that a partial match doesn't count, only full matches: e.g. in "http://google.com /querystring", "http://google.com" is a valid URL, but the whole thing is not.
The reason I say that is is the appropriate way to go about doing this is because the purpose of our RegEx is to parse URLs in chat, which will most likely have other spaces.

Instead of leeway which I just reasoned to give, a restriction should be that your RegEx must function without the use of ^$ around it, because if it requires that it's no longer realistic (^some regex$ matches from the start to the end of the string, unless you're using a RegEx engine that splits on new lines), and since from what I can tell the JavaScript RegEx engine handles it as from the beginning to the end of the string, and $ does not match line breaks.

**By the way, my entry (388 characters, but given the clarifications I made above it matches every single test case that is currently present):**
```
((?:(?:https?|ftp):\/\/)((?:[^\s:@]+(?::[^\s@]+)@)?(?:(?:(?:(?!(?:22[4-9]|2[3-5][0-9]|127|10|0))(?:2(?:[0-1][0-9]|2[0-3])|1[0-9]{1,2}|[1-9][0-9]|[1-9]))\.)(?:(?:2(?:[0-4][0-9]|5[0-5])|1[0-9]{1,2}|[1-9][0-9]|[1-9])\.){2}(?:(?:2(?:[0-4][0-9]|5[0-4])|1[0-9]{1,2}|[1-9][0-9]|[1-9]))|(?:(?!.+\.\/)(?:[^\s\-\.](?:(?!\-\-+)[^\s\.]*?[^\s\-])?\.)+[^\s\.\-0-9:\/]+))(?::[0-9]{1,5})?(?:\/[^\s]*)?))
```
@Marsh: I'll be on the shoutbox sometime in the morning if you have any comments about anything I wrote here.
Link to comment
Share on other sites

That is amazing panda, thanks a lot. I have updated the scores on the first page.

**Rankings:**
1st. PandaCoder - 36 points.
2nd. TehDoug- 24 points.
3rd. Lemon- 24 points.
4th. Marsh - 11 points.
5th. creatorfromhell - 5 points.
Link to comment
Share on other sites

  • 2 weeks later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...