Page 1 of 1

Using RegEx to parse URLs

Posted: 2013-01-11 17:37
by Tekno Venus
Hello AppGini users, I hope you may have some suggestions for my issue. :)

I am setting up a new database system for my website, and the database has well over 3,000 entries in. The database contains information about many 3rd party drivers used in Windows.

Anyway, one of the fields in the database contains a URL to the driver manufactures support site to assist users in updating the drivers on their system. However, in that fields, we also have data that is not a URL. This causes issues. Due to the way AppGini parses URLs when the field behaviour is set to URL, all the text is turned into URLs, which causes issues. For example:
IMAGE

So what I need to do is be able to parse URLs from regular text and then hyperlink them.

Firstly I tried using Rich(HTML) boxes and adding html link tags, but that increased the size of the database dramatically, and as my other colleagues said, HTML in databases can pose security risks.

Therefore I tried using RegEx. However, I cannot get it to work. The RegEx I found online looks like this:

Code: Select all

function jAutoHyperlinkURLs($sInput, $sTarget = "_blank", $sClassName = "") {
	return preg_replace("/\\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]/i", "<a href=\"\\0\" target=\"$sTarget\" class=\"$sClassName\">\\0</a>", $sInput);
}
This function accepts three arguments; $sInput, $sTarget (optional), and $sClassName (optional). $sInput is the text you would like to search and create hyperlinks for, $sTarget is the target window that hyperlinks will open in (_blank, _parent, _self, for example), and $sClassName is the CSS class name you would like to use for any hyperlinks that are created. (The function is from here: http://www.endseven.net/php-auto-hyperl ... sing-regex)

What I am unsure about is what to do with this. I tried making this hook:

Code: Select all

function drivers_before_insert(&$data, $memberInfo, &$args)
{
        $data['source'] = preg_replace("/\\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|]/i", "\\0", $data['source']);
        return TRUE;
}
But it didn't work. Drivers is the name of the table, and source is the name of the field with the URLs in.

So, I am coming to the forums for help. I know very little PHP (but give me a BSOD and I'll fix that for you! :D) so I wondered if anyone could lend me a hand. I would really appreciate it. :) I'm open to all suggestions and ideas, I'm not sure I'm even using the right hook. Would the tablename_dv() hook be better since it does say in the help file:
$html (passed by reference so that it can be modified inside this hook function) the HTML code of the form ready to be displayed. This could be useful for manipulating the code before displaying it using regular expressions, ... etc.
If no one can think of a way to possibly do this, I think it would be a good idea for future releases, since I am sure I'm not the only one with the need to parse URLs from text. Lots of web software can do this, it is surely not too hard to implement. I feel that I've got close, but none of the above scripts have worked. I have purchased the full, pro version of AppGini.

Thank you for your help in advance.

Re: Using RegEx to parse URLs

Posted: 2013-01-18 18:01
by Tekno Venus
**bump**

Just bumping this - surely someone has an idea! I've seen some complex code posted around here, so surely mine can't be to hard. If any further information is required, I can provide it either in a post or via PM.

Thanks!

Re: Using RegEx to parse URLs

Posted: 2013-01-19 16:20
by Johnk
Hi Tekno Venus, I can't help you but I'm trying to think where you would mix URL's with normal text in a database. Having designed databases since dbII in the early 80's I've always separated URL's from other fields and text blocks. Maybe this is one where you need to bring the mountain to Mohammad. Will your user be annoyed if the URL is separated? Probably not.

Many hours of frustration can be avoided by rethinking a vexing issue.

I know it wasn't very helpful . . . but!

Re: Using RegEx to parse URLs

Posted: 2013-01-20 09:55
by Sergio
Yep i tried to help but to be honest i always fall in the question: why the hell you should be have url mixed with text?? :P Can you post an example of your db data? i cant imagine a good reason for that mix in a db :) sry i cant be more helpfull without an example.

Re: Using RegEx to parse URLs

Posted: 2013-01-27 17:55
by Tekno Venus
Thanks for the replies both of you.

You can see our current database set-up here: http://sysnative.com/drivers/. This is a custom coded solution but has many issues. You can see in the source column how URLs and text have been mixed together.

Re: Using RegEx to parse URLs

Posted: 2013-01-27 20:51
by Sergio
Why you dont use Source column as a link colum, and add a new column named for example "link note" where you will add that note about link? So you dont need anythink complicated and the db is more clear (links with links and notes with notes). I know it's not a "programming solution" but is the first think popped up in my mind :)

Re: Using RegEx to parse URLs

Posted: 2013-01-30 18:19
by Tekno Venus
Sadly, I think that we need the data mixed together. We already have enough columns and some drivers really need the mixed text. To give you some examples:

http://sysnative.com/drivers/driver.php?id=a2util64.sys
http://sysnative.com/drivers/driver.php?id=CM2793.sys

Re: Using RegEx to parse URLs

Posted: 2013-02-20 18:30
by Tekno Venus
No one else have any ideas here on how to do this code? I've tried working around it with different columns and things, but the only real solution is to use a RegEx to parse the URL from plain text unless anyone else has any ideas?

Thanks,
Stephen