CODEgrunt blog

Commentary and insight on web development and the Internet at large written with a wry smile and a hungry look.

jargon wars: URI versus URL - Wed Sep 16, 2009

Sometimes reference documentation can do more to confuse than to help. Take the terms "URI" ("Uniform Resource Indicator") and "URL" ("Uniform Resource Locator"). You will often see these acronyms tossed around like they are radically different beasts and this can lead to confusion when sorting out what the authors intent is.

For those just wanting the quick answer, a URL is just a specific type of URI. They are not different things. Now for the long answer. . .

Back in ye olde days of the early World Wide Web there was a different view of how web pages (and other services) were going to be found. A URI represented the entire heirarchy of methods for finding resources and these were broken into 3 discrete types (with the option to add more later). The 3 types were URL, URN and URC (with the latter "Universal Resource Characteristic" never gaining common use). URLs were to be the addresses that machines would use to locate services and then there were URNs ("Universal Resource Name") which humans would use to find these services. The intent was that humans would never see URLs at all and much like how DNS maps human readable host names like "codegrunt.com" to numerical IP addresses, browsers would hide URLs behind more readable URNs using some external service to do the conversion between then.

Well, this approach never really took off as people (and programmers) started using URLs directly. While URNs still have a place in modern infrastructure, they are no longer thought of as a discrete type of URI. The W3C covers this pretty well:

URIs, URLs, and URNs: Clarifications and Recommendations 1.0

. . . according to the contemporary view, the term "URL" does not refer to a formal partition of URI space; rather, URL is a useful but informal concept: a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network "location"), rather than by some other attributes it may have. Thus as we noted, "http:" is a URI scheme. An http URI is a URL. The phrase "URL scheme" is now used infrequently, usually to refer to some subclass of URI schemes which exclude URNs.

Unfortunately, even documentation writers do not always get the distinction fully right which does not help with the confusion of this issue:

PHP Documentation: parse_url()

Note: This function is intended specifically for the purpose of parsing URLs and not URIs. However, to comply with PHP's backwards compatibility requirements it makes an exception for the file:// scheme where triple slashes (file:///...) are allowed. For any other scheme this is invalid.

The above note suggests that URIs and URLs are not the same thing which is not true. What they should be saying is:

Note: This function is intended specifically for the purpose of parsing URI schemes "http" and for backwards compatibility reasons, "ftp".

Ah, jargon. How I hate love you.

topics: Internet, jargon, PHP

simple MySQL handler class - Mon Aug 31, 2009

MySQL has come a long way since its early days what with stored query support and such. That said, the usage you will find in the average web application is still going to be limited to basic INSERT, UPDATE and SELECT queries. While there are no shortage of MySQL abstraction classes, all of the ones I have come across tend to either add too much overhead or do not really make the process of retrieving data from the backend any quicker or easier. Thus the creation of this MySQL handler class.

The intent with this class is not to be pretty or completist. It's goal is to provide the core functions needed for the average web application while keeping it simple. Query results are returned as multidimensional array using associative column names. It offers a few "magic" values by default ("insert_id" and "numrows") and has some utility functions thrown in for dealing with the result set.

I do not claim this is the most beautiful MySQL class out there. I can say however that it makes life a lot easier in many situations compared to either dealing with the database directly or using a full scale database abstraction layer.

A simple MySQL handler class.

validating credit card numbers - Mon Aug 17, 2009

This code is pretty old now but still useful for basic checks against incoming credit card data. In most situations these days, this logic will be handled via supplied libraries from the processor themselves but it still is often useful to trim out the garbage data before you get to that point.

Source code for this simple credit card validation class can be found here.

topics: PHP, programming

filling up MySQL's fulltext search - Fri Aug 14, 2009

I pretty quickly noticed when I started using MySQL's fulltext search capability that a lot of things I was searching for seemed to be missing from results. It depends on your content of course but for example, a stock MySQL install will find its fulltext matching functions skipping over terms like "CMS" (too short) and "value" (a "stop word"). Luckily there is a simple way to keep using the nice features provided by fulltext indexes and make sure that important terms are not skipped over just by using up some more disk space.

The solution here is to keep two copies of your data - one for display purposes and a second massaged version which you search against. This second version needs to have the following done to it via a custom "searchify" method:

  • strip out unwanted markup and characters
  • strip out any unwanted stop words
  • add padding characters to words that are too short
  • add padding characters to wanted stop words to avoid matching MySQL's internal stop word list

This searchify method also needs to be run against any search query so as to best match the massaged search data.

Here is example code showing this method in action.

If you wanted to search this blog for the text "PHP" the MySQL query would look something like this:

// example query showing how to use FULLTEXT search SELECT blog.id, (MATCH(blog.data_search) AGAINST ("phpzzgghhn" IN BOOLEAN MODE) ) AS relevance FROM blog WHERE ( MATCH(blog.data_search) AGAINST ("phpzzgghhn" IN BOOLEAN MODE) ) GROUP BY blog.id ORDER BY relevance DESC, blog.created

In the above query, the search string "php" has been padded with characters to get it around the default minimum 4 character limit. I should also note that the above query could be expanded on to add in better relevance sorting by taking advantage of MySQL's boolean fulltext search options. The obvious option is to add in extra weighting for all words matching and phrase detection (that's for another blog posting).

For reference, here is MySQL's list of stop words.

image quality versus page load speed - Thu Aug 06, 2009

So what is the answer to following question?

100 - 25 = ???

Well, if we are talking JPG quality levels and PHP's GD based image resizing functions then the answer is an approximate 10 times reduction in file size!

A customer was recently commenting on slow page loading for site of theirs. Taking a closer look I noticed that the image resize class being used was defaulting to 100 for the quality argument for PHP's imagejpeg() function. Adjusting this down to 75 plus adding in a bit more on the fly thumbnail creation for larger images took their index page from a startling 7MB down to around 600KB (still high but this is an image heavy site)

These things are easy to miss when you are developing on a local desktop or LAN hosted server. A quick peek at the "View Page Information" menu provided by the excellent Firefox Web Developer plugin is a great way to check whether you are going well over your bandwidth budget for your current project.

blog

CODEgrunt consulting

Experienced PHP web developer.

$35 an hour, no job too small.

There is no replacement for integrity and experience.

Contact me for more information.

calendar

July 2010

Sun
Mon
Tue
Wed
Thu
Fri
Sat
27
28
29
30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31