Community Blogs
July Slides
Speed Up Your MySQL Queries Using MD5 and Indexing
So I created a poorly constructed 'SELECT' query in a stored procedure and I was up late last night trying to figure out how I could best optimize it. After sitting in my 94 degree basement for a few hours, I thought of using hashed values via CONCAT and MD5 functions to create an index of multiple columns to improve 'SELECT' queries using 'JOINS.' The performance increase was instant after I made the necessary adjustments. To save others the time and headache, I'm going to share with you my solution and the resulting performance improvements.
So, I am dealing with two tables called "meetup_events" and "location" containing 1,000+ and 290,000+ records, respectively. Both tables contain information for:
- location name
- location address
- location city
- location state
- location zip
- location latitude
- location longitude
- location phone
location table Field Type Null Key Default Extra lid int(10) unsigned NO PRI NULL auto_increment name varchar(255) NO street varchar(255) NO additional varchar(255) NO city varchar(255) NO province varchar(16) NO postal_code varchar(16) NO country char(2) NO latitude decimal(10,6) NO 0.000000 longitude decimal(10,6) NO 0.000000 source tinyint(4) NO 0 is_primary tinyint(4) NO 0
The data fields that I need to extract include locations from the meetup_events table that are not in the location table. The query that I was using originally is:
SELECT DISTINCT
`me`.`venue_name`,`me`.`venue_address`, `me`.`venue_city`,`me`.`venue_state`,
`me`.`venue_zip`,`me`.`venue_phone`, `me`.`venue_lon`,`me`.`venue_lat`
FROM `meetup_events` `me` LEFT JOIN `location` `l`
ON( `me`.`venue_address` = `l`.`street`
AND `me`.`venue_state` = `l`.`province`
AND `me`.`venue_city` = `l`.`city`)
WHERE `me`.`venue_lat` <> 0 AND isnull(`l`.`name`)
Unfortunately, there is no immediate way of identifying a relationship between the location associated with each meetup_event and a record in the location table that uses an index because the meetup_events table contains data that is imported from a third-party source. This is where I took a wrong turn and JOIN'ed the tables using the address, state, and city fields.
...
ON( `me`.`venue_address` = `l`.`street`
AND `me`.`venue_state` = `l`.`province`
AND `me`.`venue_city` = `l`.`city`)
...
This didn't pose an issue when the number of records in the location table was smaller, but as the table grew it failed to scale and resulted in ERROR: query execution was interrupted. Not fun!
So after thinking about the minimal data that I would need to constuct a unique identifer for a given location, I decided to turn my attention to the location/venue name, latitude, and longitude data. This will take care of potential duplicates such as locations with mutliple venues like malls, multi-storied buildings, etc.
Now that I have figured out that aspect, I must store the data in a format that is optimial for the MySQL to quickly read and index. Without thought, I turned to the CONCAT() function to provide a single field to store the data and return a scalar value to store.
CONCAT(name, latitude, longitude);
To ensure that the data remains relatively short and unique, the MD5() function comes into play which will result in a hashed value of the concatenated value returned by CONCAT(name, latitude, longitude).
MD5(CONCAT(name, latitude, longitude));
Example: SELECT MD5(CONCAT('The Park at 14th', 38.879456, -76.985059));
e3dd1caa1253f2f5def9caadc3e474b7
Using this means of generating a unique value, both tables (meetup_events and location) are given a new field called, hashed_index of type varchar(100) with an INDEX.
meetup_events table (with new field) Field Type Null Key Default Extra id int(11) NO PRI NULL name varchar(100) NO NULL description text NO NULL time timestamp NO CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP group_id varchar(100) NO NULL group_name varchar(100) NO NULL group_photo_url varchar(200) NO NULL fee varchar(100) NO NULL how_to_find text NO NULL organizer_name varchar(100) NO NULL nid int(11) NO NULL status int(11) NO 0 venue_name varchar(100) NO NULL venue_address varchar(100) NO NULL venue_city varchar(100) NO NULL venue_state char(5) NO NULL venue_zip char(5) NO NULL venue_phone varchar(15) NO NULL venue_lon decimal(10,6) NO NULL venue_lat decimal(10,6) NO NULL hashed_index varchar(50) NO MUL NULLlocation table (with new field) Field Type Null Key Default Extra lid int(10) unsigned NO PRI NULL auto_increment name varchar(255) NO street varchar(255) NO additional varchar(255) NO city varchar(255) NO province varchar(16) NO postal_code varchar(16) NO country char(2) NO latitude decimal(10,6) NO 0.000000 longitude decimal(10,6) NO 0.000000 source tinyint(4) NO 0 is_primary tinyint(4) NO 0 hashed_index varchar(50) NO MUL NULL
Here is the query rewritten and optimized:
SELECT DISTINCT
`me`.`venue_name`,`me`.`venue_address`,
`me`.`venue_city`,`me`.`venue_state`,
`me`.`venue_zip`,`me`.`venue_phone`,
`me`.`venue_lon`,`me`.`venue_lat`
FROM `meetup_events` `me` LEFT JOIN `location` `l`
USING( hashed_index)
WHERE `me`.`venue_lat` <> 0 AND isnull(`l`.`name`);
If you have any multiple column indexes within any of your tables, try indexing a hashed version of those fields concatenated, and then test to see if this results in a performance increase. Good luck!
GPL: WordPress, Thesis, web2project, and Beyond
For those of you just tuning into the WordPress/Thesis battle, here is the current - as of 19 July 2010 - state of things:
- At some point recently, the servers of Chris Pearson - owner and distributor of the Thesis theme for WordPress - had a vulnerability on his server and the latest releases of Thesis were compromised.
- When that came to light, Matt Mullenweg - head of the WordPress project, founder of Automattic, and founder of the WordPress Foundation - has previous stated that "WordPress Themes are GPL too!" sent a simple Tweet: "This is what happens when non-coders think they can code."
- In a matter of moments, that was being reTweeted, argued, discussed, blame was being spread around, and things started getting heated.
- Andrew Warner had the insight to get both Chris and Matt on the phone for a Mixergy interview. Chris came off poorly (at best) when he "explained" that the GPL didn't apply to Thesis and further claimed that he was one of the three most important people in WordPress. Things exploded from there.
- Matt raised the stakes by making a public offer to buy any existing Thesis customer a new premium theme from somewhere else. And then started tweeting individual Thesis customers making the offer.
- And igniting things further, Andrew Nacin - another core WordPress developer - and a few others (Andy Peatling, Drew Blas) started digging and found snippets of GPL'd WordPress code in Thesis.
- Next, a commenter on Matt's update post (titled: "Syn-Thesis 1 and Chris Pearson") who happened to be a former Thesis developer admitted to copy/pasting things from WordPress because he didn't understand the license.
- Finally, Chris says that the offending code has or will be removed but Mark Jaquith (and the Drupal and Joomla communities) wrote "Why WordPress Themes are Derivative of WordPress" so offending code or not, it doesn't matter.
So what does this all mean to the community?
Good question... As my partner Marco Tabini notes in "WordPress, the GPL, and cherries on top" everyone has an opinion on what the GPL means and what its ramifications are but since there's no legal precedent, it's just a best guess... and there has yet to be a precedent to solidify an interpretation. In the meantime, the most common interpretation is based on the Software Freedom Law Center's opinion and the GNU FAQ. Here's the problem with that:
The FAQ is not part of the license and not distributed with it. It is stored on a website without version control or an audit trail on who might have modified it when. By all accounts, it is less reliable than Wikipedia because even errors can't be fixed.
Feel free to cite the FAQ all you want.. no one ever explicitly or implicitly agreed to that interpretation.
If you combine this with Matt and Mark's opinion that themes are GPL because they're dependent on WordPress code and datastructures to have meaning, what are the ramifications of this interpretation?
- What is the threshold where including code crosses from "Fair Use" into "now this must be GPL"? While there are lots of ways to do things in PHP, if there are Best Practices, we'll all tend to reach similar results.
- For client work, if copyright isn't transfered with the delivery of the code, isn't this distribution? And if so, doesn't the client work have to be GPL? Or if you transfer the copyright at delivery, what about the demo versions? Did you send them copies of the code before you transfered copyright?
- If the client work does have to be GPL, do you have to make it available to anyone who asks for it? Since the GPL doesn't requite a "public disclosure", it's possible no one would know to ask.. but what about employees of either group? If it's "privately" GPL, would an employee be within their rights to take and use a copy?
- What does this do to Non-Disclosure Agreements? If the code must be GPL and I have to provide a copy to whoever might ask, how can I agree to protect the secrets of my customers?
- What about a book that covers GPL code and datastructures? What if the book goes as far as including snippets of WordPress code itself? While the book may be able to physically exist and function independent from the code, it doesn't have any meaning without the core system. Should Lisa Sabin-Wilson GPL her book "WordPress for Dummies"? Does Aaron Brazell have to GPL his "WordPress Bible"?
- A given WordPress post consists of a title, body, and a category or some tags. It's pretty trivial to turn that into a post on the page without using WordPress at all. But what about a complex data structure that only has meaning once it is transformed by the core code?
That last question is the point of the discussion we've had within the web2project team recently.
If you're not familiar with web2project - or its parent dotProject - the data in the database doesn't mean much by itself. The core code must retrieve the data and process it to express a project plan in a meaningful way. It does the same to files uploaded, tasklogs stored, and a number of other things within the system.
Therefore, since the data is wholly dependent on the core web2project code, does the data itself have to be GPL?
And that's why a few weeks back, we started the process of changing licenses from GPL back to BSD as dotProject originally was. I'll go into more detail on the process and due dilligence involved, but know that we've been on this for quite a while and are working to have it fully resolved before the next web2project release.
Disclosures: I don't have an interest in Thesis as I've never used it and don't know Chris Pearson at all. I use Automattic's Akismet for blocking spam on a number of Drupal sites. Finally, I am using WordPress on the web2project site relaunch and was a reviewer in Aaron Brazell's WordPress Bible and wrote the foreword. My biggest concern in all of this are the larger implications on web2project and GPL projects in general.
WordPress, Thesis, and GPL Heartache
Several battles have been brewing from within the WordPress community.
It's WordPress, not Wordpress dangit!
A couple weeks ago, a snippet of code got sneaked into WordPress 3.0. The function, capital_P_dangit, is a filter that replaces "Wordpress" with "WordPress" throughout one's site.
While this doesn't seem like a big deal, this change infuriated some users. They claimed that WordPress does not own a user's content, and therefore, such a move (without first explicitly asking to do so) infringes on the user's expressive freedoms.
The filter itself doesn't bother me personally. What bothers me is the lack of review. This change produced a rarely-occuring permalink bug in WordPress 3.0. This "feature" was added with barely any discussion, and hopefully isn't an indication of things to come.
Themes and Plugins Must be GPL
WordPress is licensed under the GPL (v2). It's been clearly stated that all WordPress plugins and themes should be licensed under the GPL as well.
As written on Perpetual Beta, "Anyone who gets a copy of a premium theme then has the right to freely distribute it or modify it virtually without restriction (expect, of course, those restrictions found in the GPL itself)".
The popular paid theme, Thesis, does not adhere to the GPL license. The authors refuse to.
The GPL (version 2)
The GPL ensures that software is freely redistributable, but not necessarily without a price. You've probably heard the phrase "free as in free speech, not as in free beer." GPL software is meant to protect its users (and the community), not the authors. Users have four essential freedoms:
- The freedom to run the program, for any purpose (freedom 0).
- The freedom to study how the program works, and change it to make it do what you wish (freedom 1). Access to the source code is a precondition for this.
- The freedom to redistribute copies so you can help your neighbor (freedom 2).
- The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.
Thesis is Special, Right?
The authors, particularly Chris Pearson, fear that switching to GPL will de-value their product. It'll allow for knock-off themes that are strikingly similar to Thesis — but for a fraction of the cost.
Under the GPL license, free distribution is encouraged -- as long as the product (or derivative) remains under the GPL.
The other argument is that although Thesis requires WordPress to function, the vast majority of the theme's code is totally original. In that respect, some argue that it may not fall into the "derivative work" category, which mandates the GPL stamp.
Chris claimed during a live call that WordPress cannot lawfully force Thesis to adopt the GPL. He coined the GPL as a "flimsy and unenforceable license".
No, says WordPress
The creator of WordPress, Matt Mullenweg, argued that licensing Thesis under the GPL was "the right thing to do." He argued that commercial GPL themes (and plugins) can still be profited from, and there are other effective models at earning a steady income (such as through offering services and/or paid support).
Most notably, those supporting Matt's stance argue, "If you don't agree with the GPL, then don't develop for WordPress."
Although both sides have decent points, I was a little disappointed by all the smearing via Twitter. At one point, Matt essentially told followers not to use Thesis, and (in response to a Thesis vulnurability claim), "This is what happens when non-coders think they can code." This isn't saying that Chris wasn't without flaws, as his arrogance shined during the live call: "I am one of the three most important people in WordPress."
Conclusion
This is an ongoing battle, with both sides clinging to their interpretations of the GPL license. I don't see either side giving up very easily, so this dispute could likely find its way into court. I'm excited to see how this unfolds, as a decision will affect not just WordPress, but many other open platforms (Drupal, Joomla, etc.) as well.
What are your thoughts?
Life with Quercus: PHP Via Java
There has been a lot of (virtual) ink spilled over HipHop, Facebook's new PHP-to-C compiler; but a similar approach has flown a little more under the radar. I'm talking about Quercus, a PHP interpreter/compiler written in Java by Caucho, the makers of the Resin app server.
Quercus is a servlet that can be deployed in any J2SE/J2EE servlet container — Resin, Tomcat, JBoss, WebSphere, etc. — and can run PHP applications deployed in the same webapp. Of course the question is why would you use Quercus when you can use Zend's PHP interpreter?
Unlike HipHop where the biggest reason to use it is performance, the case for Quercus is slightly more nuanced. Quercus has more capabilities than HipHop out of the box; for instance you can drop Drupal, WordPress, Joomla, MediaWiki, Gallery2 and more in Quercus and they run. Need to use eval (I'm looking at you, Drupal)? No problem.
Similarly, unlike HipHop which contains a built-in web server, Quercus is a servlet which provides you more options on how you run it. And because we're running on Java we have access to all the tasty goodies that people have written for it. Need asynchronous messaging? Love JMS? Multi-node caching? Terracotta and Ehcache rocks — interestingly enough someone already did this with Drupal. The list goes on and on...Apache Axis is great for web services, GraniteDS does almost as much as Adobe LifeCycle to connect with Flex.
When you look at the documentation for Quercus, they talk about performance being at least comparable to Apache/PHP, and they're mostly correct. Deploying a fairly intensive community Drupal site (OG, Spaces, lots of Views) on Resin 4 and testing with Apache Bench (AB) on a workstation puts Resin running only slightly slower than Apache, at least up to 50 concurrent users.
There is some room to grow on the Resin side as it was running pure Java, no JNI, and anecdotal testing indicates that this provides some 10-12 percent performance benefit. So out of the box this would put Resin processing requests at about the same speed as Apache.
We were testing the community version of Resin and Quercus, the Professional version has the ability to compile PHP to Java using the same sort of static analysis techniques used by HipHop and, at least according to benchmarks published by Caucho (PDF), show an increase in performace of Drupal, WordPress, and other common PHP applications, ranging from ~17 percent in phpBB to almost 110 percent for WordPress.
So, what's not to love, right? We can take existing PHP applications, move them over painlessly and get both increased performance and capability.
Sadly, things are not that clear-cut. Going back to the example below of the 50 concurrent connections. While performance was roughly equivalent, the effect on system resources was not, Resin used considerably more memory than the comparable test on Apache. Another issue is lack of sophisticated URL rewriting on some servlet containers. Resin supports enough of the functionality of mod_rewrite to make Drupal work, but others do not. Getting Drupal clean URLs in Tomcat, for example, is difficult. Other uses of mod_rewrite may not work at all without significant effort.
And perhaps the biggest downside for Quercus is inherent in what it is. It's Java. For organizations that have made a significant investment in PHP the code is not all there is. You have systems tuned for the LAMP stack. You have system administrators that are knowledgable and experienced in maintaining those systems. You build solutions that use PHP, and while the additional functionality of Java is nice, what do you do with it? But like HipHop, the very existance of Quercus is encouraging. While I don't have a project right now targeting Quercus, and I may never, the fact that companies like Facebook and Caucho are innovating and adding new options, new features makes this an exciting time.
The Ten Commandments of Open Source
Update: To be fair, a great deal of inspiration for this post came from Ed "Funkatron" Finkler and his session at phpWorks 2008 Picachu pitch-at-ya peek-at-you lightning talk called "Users are Assholes" and later by Matthew Weier O'Phinney's lightning talk called "How to get kicked off my Project." Thanks for the fodder guys!
On all sides of software development, there are annoyances. Some of them go beyond annoyances and into pet peeves. Some of them go beyond pet peeves and incite open war within projects and communities. After witnessing a number of these rants - from developers and non-developers - and even sharing a few myself, I thought maybe I could make the situation better.
Therefore, I present here for comment:
The Ten Commandments of Open Source Software.
0. Thou shalt not ask questions without basic research including documentation and the Google.
I. Thou shalt realize that most developers work on passion and freetime. Thou are not paying them.
II. Thou shalt not criticize code and architectures before seeking understanding.
III. Thou shalt not use the phrase "this should be simple" unless thou has confirmed it as such.
IV. Thou shalt apply patches and updates in a timely manner. Further, thou shalt have a vague idea of when the next release is due-eth.
V. Honor thy developers and designers.
VI. Thou shalt include specific error messages and screenshots when thou hast problems. Thou stating "thy software sucks" is not helpful and angers the Funkatron.
VII. Thou shalt not murder.*
VIII. Thou shalt not copy without respecting both the license and the terms within.
IX. Thou shalt not covet the features and functionality in thy neighbor's software.
Did I miss any?
* No, I'm not kidding.
web2project v2.0 Release Notes
As of 29 June 2010, web2project v2.0 is officially released! You can download it from SourceForge now.
Although this release had lots of bug fixes, the primary focus was on a few specific new features and major pieces of functionality. You can read the full v2.0 Release Notes, but in my opinion, the six most important items are:
- User-based Timezones: Everywhere a time is used or displayed within the system, it's now stored in GMT/UTC and presented in the user's local timezone. If you set a meeting for 5pm America/New_York, someone with America/Chicago timezone will automatically see it at 4pm. If you have team members spread across timezones, this is vital. Lots of thanks to Derick Rethans - the master of all things time-related in PHP - on his numerous presentations on DateTime and Timezones. His information made it possible.
- Unit Tests have come a long way since the v1.0 release. We had zero at that time but Trevor Morse - founder of the Halifax, NS PHP Group and core web2project member - has led the way to 240+ tests focused on the Tasks and Projects classes.
- Subprojects are now Useful: Previously you could denote one project as a subproject as another but it didn't really do anything, it was just presented a bit differently. With this release, when you assign a project as a subproject, now it creates a token task within the parent project. This token task takes on the start/end dates, duration, hours worked, and percent complete of the subproject. As the subproject updates, the token task updates. Even more usefully, you can use the Token Task as a dependency for other tasks in the parent project. That all sounds complicated, so just try it out.
- The class structure has been completely restructured: While this won't be relevant to 99% of our users, it makes it much easier to add standalone frameworks - like the Zend Framework or whatever - to the system for other functionality. On a related point, web2project now supports the naming conventions put forth in the Framework Interoperability Group. This will also allow easier third-party authenticators for systems like Drupal, Joomla, or WordPress.
- Audit Logs - We cleaned up the core system objects to provide historical logging of all CRUD operations. Any Add On modules that use the core objects will get this functionality by default.
- Added an 'update checker' - This is a regular script which runs to notify the System Administrator that a new release is available and collects basic system information. This was modeled after Drupal's update functionality. No sensitive information is collected and this can be opted out of via the System Configuration.
A number of community members stepped up and did a great job in reporting issues, helping test fixes and release candidates, and generally being insightful. Special thanks goes to opto, adolfo, zbyszek, and egemme. Without them, v2.0 would not be as solid, useful, and generally as bug-free as it is.
In summary, we closed about 79 items with ranging from 15 crash-level issues to 42 minor bugs. Once again, those are just the formally reported issues. If you want to explore everything of interest, check out the web2project v2.0 Release Notes on our wiki. And of course, if you're looking for ways to share your code more easily, you should check out our web2project git repository.
You can download web2project v2.0 from SourceForge now.
* And yes, I always wait a few days to announce the releases in case we have to make a patch release. ;)
Salsa Rules! A New Drupal Module for Democracy in Action
One frequent request we get from clients is to integrate their Drupal web site user profiles with a Customer Relationship Management (CRM) system so that their users' information can be used for email messaging and other purposes.
We've worked with a variety of CRMs including CiviCRM, SalesForce, Avectra netFORUM, Convio, and Democracy In Action's Salsa.
Salsa has a focus on advocacy activities including creating and signing petitions, generating letters to the editor, and fundraising. so it's a natural fit for a lot of our work with nonprofits, NGOs, and other public policy organizations.
When built EarthDay.org, we created a social networking site for users to register and commit to the cause, and at the end everything had to flow into Salsa to serve as one master list of events, petitions, and individuals for the Earth Day Network.
In order to make the process as flexible as possible, we based our work on the Drupal Rules module, adding actions to create Supporters, Events, Petitions, Campaigns and Groups from nodes in Drupal.
Below is a screencast showing some of the potential. I take you through the process of creating a new user with a Content Profile node and creating a Supporter in Salsa. Then, I create an Event in Salsa from one in Drupal and have a user register for that Event. The result is that having their Supporter record is correctly associated with the Event in Salsa.
This new module is available on drupal.org at www.drupal.org/project/salsa_rules. I hope you find it useful to your work.
MongoDB: A Simple User Directory
In my last post, I glossed over MongoDB from a very high level. While it's useful to know how MongoDB works, I didn't cover exactly how to use it.
There are a growing number of NoSQL systems, and one of MongoDB's greatest strengths is its ease-of-use. It takes a couple of minutes to install the server, and commands are given using the familiar JSON syntax.
Setting up MongoDBI'm running Windows, but installing on a Mac or *nix system shouldn't be much different. After downloading the appropriate package, copy the mongdb folder into c:/data/db/. Then, using your shell (cmd prompt), browse to c:/data/db/mongodb/bin and execute the server executable, mongod.exe:
Voila! The MongoDB server is now up and running. Let's fire up a separate window, this time for an instance of the MongoDB client. The client is mongo.exe, also in the bin directory.
Setting up a User DirectoryWe want to create a simple user directory, and want to store the following information about each user:
name age languagesUnlike with traditional SQL systems, we don't have to create a database schema. You simply create a JSON object containing the data you want, and MongoDB saves it as a document. By default, MongoDB will use the test database. Let's change that to a new database called users:
use users;I know, it was pretty intense. Once you've wiped the sweat from your brow, let's add some data to our new database:
var data = [ {name: "John Doe", age: 28, languages: ["English", "Arabic"]}, {name: "Mike Smith", age: 35, languages: ["English"]}, {name: "Sally Mae", age: 41, languages: ["Italian", "French", "Arabic"]} ];To add these people to our database, we run the following command:
db.users.save(data); Getting all usersTo find all items in the collection, you'd run db.users.find().
This returns our users. If there are more than 10 users, you'll need to enter the "it" command to iterate through the rest of the results. Fortunately, the MongoDB PHP extension makes iterating a lot less tedius.
Getting specific usersYou're able to enter specific parameters into the find() function to retrieve filtered results.
To find "John Doe":
db.users.find({name: "John Doe"});To find anyone 41 years old:
db.users.find({age: 41});To find anyone who speaks Arabic:
db.users.find({languages: "Arabic"});MongoDB also supports comparison operators (<, >, <=, >=, etc), but the format takes a bit to get used to. For example, let's find users between 30 and 40 years old:
db.users.find({age: {'$gte': 30}, {'$lte': 40}});NOTE: "Less than" is '$lt', whereas "Less than or equal" is '$lte'.
SnafusWhen using MongoDB, you'll run into difficulties that were non-issues in MySQL. One example is using "OR" when building queries. In MySQL, it's quick and painless to use OR when finding records. Let's find users <= 30 or >= 40 years old:
MySQL
SELECT * FROM users WHERE age <= 30 OR age >= 40;MongoDB
The "OR" operator was just added to a nightly build of MongoDB, but is unavailable in the current stable version. It's still possible, but it requires a custom '$where' function:
db.users.find({'$where': function() { return this.age <= 30 || this.age >= 40; }}); ConclusionMongoDB seems to be a pretty powerful tool for document storage. It's fairly robust, extremely fast (from what I can tell), and easy to use.
I cannot stress enough that MongoDB (or any NoSQL solution) is not a substitute for all things SQL. If your application has transactional or relational requirements, you're better off sticking with SQL.
MongoDB (NoSQL): An Architectural Overview
MongoDB is one of the forerunners in the NoSQL movement, an effort to promote non-relational, schema-free data stores. It lacks any table JOINs, which avoids performance bottlenecks seen with traditional SQL servers.
Why NoSQL?NoSQL servers are not meant to replace traditional SQL servers. They are meant to handle problems without heavy transactional requirements, but with the potential to massively scale if needed. They work well for providing quick access to large number of documents, serving pages on high-traffic web sites, or delivering streaming media.
Document Storage: Database > Collection > DocumentMongoDB is a document database. Documents are stored in collections, and collections are in turn stored in a database. A collection is similar to a table in MySQL (it's a named group of documents), but a collection lacks any schema.
DocumentsA document contains the actual data, and is stored as a binary JSON object (called BSON). A single document has a storage limit of 4 MB. Queries are expressed as JSON-style objects, making it pretty painless to save and retrieve data:
var data = {first_name: "John", last_name: "Doe", lottery: [42, 16, 29]};
db.users.save(data);
Files can be stored by MongoDB by using the GridFS specification. Since a document can be a maximum of 4 MB, GridFS works by splitting a large file into (usually 256k) chunks before storing it into a files collection. Unfortunately, MongoDB doesn't do any sort of automatic cleanup in the event of a processing error (your collection would be stuck with fragments of the corrupted file).
Horizontal ScalabilityMongoDB uses (auto) sharding, not replication, as a way of achieving high scalability. Sharding essentially involves "breaking your database down into smaller chunks called shards and spreading those across a number of distributed servers. With Mongo, we tend to think of replication as a way to gain reliability/failover rather than scalability. (See this article for more.)
Performance- There's a client driver per language
- CouchDB uses REST
- Documents have a maximum size of 4MB
- This is not changeable
- Memory-mapped files for data storage
- This means that data is limited to around 2GB on 32-bit systems
- MongoDB stores as much data in RAM as possible
- Update-in-place (instead of MVCC, as in CouchDB)
- Written in C++
Stay tuned for my next post, which will cover the creation of a simple web app using Mongo and the PHP driver.
TEK·X is Complete
It's been a week since tekX completed and there are a number of other tekX writeups to read but I thought I'd share one last one from a different point of view. To add some context, I'm not the guy that signs the contracts, approves the expenses, schedules the sessions, arranges the speakers, or anything useful. Due to other project commitments, I basically served as backup for some of those things and then as social chair for the evening events.
First off, I'm proud of our first time tekX speakers. Easily half of the speakers had never spoken at a tek before, off the top of my head that includes - Bill Karwin, David Strauss, Jason Austin, Josh Holmes, Kanwalijeet Singla, Kristina Chodorow, Matt Schmidt, Matt Turland, Ryan Stewart, Sumit Chawla, and myself. Since tek is the Community Conference and it tends to be incredibly technical, it takes a different mindset than many of the other conferences. Even better, a few of those people were first or second time conference speakers period.
Next, based on the feedback from Joind.in, the sessions themselves were consistently solid and "good" at minimum. The speakers did a good job of choosing topics, content, and styles that sparked the interest of attendees. Even better, I regularly heard "I never thought of it that way" in the hallways and at meals. That gets me excited. When someone thinks of something differently or new because of an idea of perspective that a speaker introduced, things can change. They can improve what they're doing. Their organization can improve. The community can improve. Everyone wins.
Next, there were a few sessions that stood out as particularly relevant and interesting. Josh Holmes' opening keynote on The Lost Art of Simplicity. Although I wasn't there, Kristina Chodorow's three hour tutorial on MongoDB topics was received incredibly well and generated lots of discussion. Ryan Stewart's session on the last day included an overview of Data Visualization gave a good overview of tools to build graphs and charts. And despite being an Adobe guy, he did talk about Canvas and non-Adobe options. Finally, I've heard great things about Lorna Jan Michell's session "Open Source your Career".
Next, we had some great sponsors. In alphabetical order: Adobe, Microsoft, SQLyog, and Zend each contributed good things ranging from happy hour drinks to toys to presence to heavy involvement and planning support.
Next, the internet stayed up. No, seriously... it did. We ran out of IP addresses a couple times and it dropped to 1-5kbps at least a couple times, but it was still up. I'm not sure what was so different this year, but it worked.
Finally, once again, the people were amazing. The energy, the excitement, the comradery, the chaos, and the passion was evident. Sure, it takes some people some time to warm up, but once people start connecting, it's hard to avoid it. After tek last year, Trevor Morse (of web2project fame) started the Halifax, Nova Scotia PHP Developers' Group, but this year, this exploded. I moderated a Community User Group Panel with Michelangelo van Dam, Rafael Dohms, Lorna Jane Mitchell, and Ben Ramsey... and apparently, it pushed a few people over the edge to start their own groups.
And with that, I'd like to announce:
Chance Garcia started a local Bloomington, Indiana PHP Developers' Group.
Jeremy Kendall kicked off a the Memphis, Tennessee PHP Developers' Group.
Chris Tankersly started a semi-regional Northwest Ohio PHP Developers' Group.
These guys are taking a step out into their communities with the goal of connecting local developers, employers, recruiters, and everyone else with the goal of making their own skills - and the skills of the people around them - better. If you're in either area, join and help them. If you know people in those areas, pass the word. Even if you're not local but want to lend support, join their lists and contribute.
Even the biggest groups started with a single person with an idea.
Event Driven Programming
During my Live Coding sessions at TEK·X* last week, one of the questions that came up repeatedly was:
Events... what are these Event things and why does Flex work like that?
When you initially dive into the world of Flex development**, most PHP'ers will quickly notice something weird. We're out of the world of Request/Response that we know and understand and into an odd world of Events, Listeners, and Publishers/Subscribers where things just don't play well together.
The familiar Request/Response process is similar to going through checkout to the grocery store. The cashier is idle until a new customer appears. The customer makes their requests, the cashier transfers a specified amount from one account to another, and the customer leaves with their results. As more customers appear and the line gets longer, the stores scales horizontally by adding more cashiers. It all scales quickly and easily with minimal effort because grocery store cashiers are a shared-nothing architecture.
Alternatively, the Publisher/Subscriber (or Observer) pattern is similar to waiting at the airport. When you're waiting for a flight, you sit near your gate and wait for an announcement. In the meantime, you might grab a cup of coffee, use the restroom, check email, or make phone calls. The people around you do the same. As announcements are made, some people board their flights or swear at gate agents but regardless until your announcement, you don't care. But once you you hear your announcement, everything changes. You finish whatever you're doing and take action.. along with anyone else listening for the same flight.
This is the core of Event Driven Programming.
Flex** works the same way. As you create elements and functions/methods, they listen for different Events to "fire". Once the Event occurs, all the Listeners attempt their task. The vast majority of these Listeners will not interact with one another but in some cases, Events will fire other Events.
A common task - like Uploading a File - will consist of a number of these Events:
- First, we have to select the file to upload. This could be done via a drag and drop or an explicit file browser option. Either way, the system sits there and waits until a file is selected.
- Next, we need the file to upload. If it begins immediately - like with Gmail - it probably fires from the previous event. Otherwise, it's likely from a user hitting a "Send" or "Done" button.
- Next, we'd like to provide feedback to the user. Since we know the file's size and we can find out how much has been sent, we can calculate a %-complete to update a progress bar.
- Finally, we need to know when the upload is complete. Most likely we have some sort of "WoohooFileUploadComplete" Event we can capture and let the user know. Hopefully, we'll also get back messages from our destination with error messages/status and we can pass them along for the user.
When you stop to consider it, many of the tasks our code performs can be broken down the same way. I believe the primary reason we don't think of it is because the vast majority of our code is entirely on the backend without the ability to even see or capture our events. Obviously, client side technology like Javascript blurs the line, but the concept is the same. If you're venturing into this world, read up on the Publisher/Subscriber Pattern and the Observer Pattern and figure out how to make your code - both on the frontend and back - play nicely together. Jason Sweat's book on PHP Design Patterns is a good place to start.
* The Live Coding in Flex session went amazingly well. I got a number of great questions and a few people were following along typing exactly what I was as I went. Yes, I had some minor technical issues but the audience was incredibly helpful in pointing out my mistakes. Over. And over. And over again.
** The Publisher/Subscriber model actually applies to most if not all desktop development. It's also familiar to the Javascript guys who work with onClick, onLoad, and all those familiar event-related sounding actions.
Disclosure: Through Blue Parabola, we're working with Adobe and Flex in a number of things. Regardless, I choose to explore Flex for the first time long before that.
Interfaces Make Testing Easier
Putting the X Back in Tek
This week I'm participating in the Tek-X conference, one of the bigger PHP conferences (along with ZendCon) around.
I'm looking forward to some pretty epic geekery on the language itself, as well as scaling with MongoDB, battling users who attempt to "game" your system, and general OO PHP goodness.
If you're here too, say hi. If not, I'll try to pass along some useful bits. Follow the event and its attendees on Twitter with the hashtag #TekX.
MySQL: Open Source Solution for Data Warehousing
At Forum One, MySQL is the open source database that powers most of our internet solutions. It's flexible, powerful and free. Our confidence extends to one one of the more complex and growing needs facing many of our clients: Data warehousing.
Data warehousing is a method for storing organizational data electronically with the intention of extracting it using business intelligence tools for sophisticated analysis and reporting. The market for data warehousing has been increasing every year. More businesses are running enterprise applications that collect valuable information, and larger nonprofit organizations are taking advantage of it as well. (TechSoup has a useful overview of how data warehousing can benefit an organization's operations.)
As organizations embrace these tools, they seek scalable and less expensive software tools. This is where MySQL becomes a good choice.
Use Cases for Data WarehousingHere are the most common uses we see:
- Real-time data warehousing
- Persistent resource between incoming data and already-stored data
- Data that is not active hourly or daily. This is the most widely used case for most businesses. It will store very large amounts of data.
- Historical data being stored in multiple areas. The data can then be used at a later time.
- Data is stored in large amounts of data in multiple warehouses. This type has less query traffic than Real-Time and Traditional warehouse setups.
Some organizations will use more than one of the use cases above. This will result in many instances of a database and servers in use, and increase costs. Open source Relation Database Management Systems such as MySQL are the best solution for most businesses looking to reduce the cost of data warehousing.
MySQL's Data Warehousing FeaturesGiven the use cases I list above, MySQL offers a number of advantages:
- Data/Index partitioning for versions 5.1 and above
- Large table space. A single table can be as large as 110 terrabytes
- Replication
- Variety of indexes (B-tree, full text, clustered, hash and GIS)
- Data compression
- Read-only tables for sensitive data
- Support for multiple operating systems
MySQL is flexible when it comes to choosing a data warehouse engine. Any of the MySQL engines can be used, but there are a few that are best for the job. The ones listed below are the best choice:
- MyISAM
- Archive
- Memory
- Merge
- Federated
Let's consider each of these engines in turn.
MyISAMThe MyISAM engine is the default engine of MySQL and can be a good choose because of high speed query inserts. If offers table level locking, but not row level locking.
ArchiveThe archive engine compresses data up to 80 percent and is a good option for large amounts of data storage. It offers good table scans and has row-level locking. The archive engine only allows data to be inserted and read. The engine does not allow for updates or deletes. This is a great option for storage of sensitive data.
MemoryThe memory table is extremely fast and has fast response time for full-table scans. The data is stored in the system memory and not on disk. It is very important that the data in memory is backed up. If the system goes down, then all data in memory is lost.
MergeThe merge table is great for data partitioning in MySQL versions 5.1 and above. The database administrator can merge one or more tables. Each table can be stored on separate systems.
FederatedThis type of table, allows database managers to create many databases on different servers. The data can exist on the local server or remote server.
MySQL's External Storage Engines NitroEDBNitrosecurity and MySQL partnered together and developed the NitroEDB external database engine. The main benefit from this, is the enormous amount of data that can be stored. The engine also provides concurrent queries.
BrighthouseAnother partnership with MySQL is Infobright, the producers of Brighthouse. Brighthouse improves data compression by using its own optimizer and not MySQL's optimizer. Brighthouse improves compression because the engine stores data in a column style and not row by row. With its column-based approach, Brighthouse can store massive amounts of data.
ConclusionThe growth of MySQL has grown in tremendous strides for the past few years. There are many enterprise features, that can be implemented to solve most needs for an organization.
In today's world, companies are looking to cut cost and also preserve existing technology investments. MySQL provides a lot of power at low cost. This makes it an attractive alternative to many other enterprise solutions.
DrupalCon Re-Cap: Accessibility in Drupal 6 & 7
While at the 2010 DrupalCon San Francisco last week, I had the pleasure of attending Katherine Lynch's session on Accessibility in Drupal 6 & 7. The issue of accessibility is a very important one, especially given the large demographic of users with disabilities. Approximately 20 percent of internet users have a disability. That's more than the number of people who use Internet Explorer 6. If we bend over backwards to appeal to web users with outdated IE 6 web browsers, then we have no excuse for not building web sites that are accessible to people with disabilities. Not only is this a good practice for all web sites, but it's a requirement for government web sites that must adhere to Section 508 provisions.
How Do You Build an Accessible Web Site? So how does one build an accessible web site? Below are some notes I jotted down from Lynch's session, and you can also watch Lynch's full presentation here. It was gratifying to find that many of our standard development practices align with her recommendations. Follow the Web Content Accessibility GuidelinesUse Web Content Accessibility Guidelines (WCAG 2.0) as a reference point. This document covers the basics of web accessibility and BEYOND. It's over 90 pages long, and could be a bit overwhelming for some. Because of the rapid advancement of web technologies, the WCAG 2.0 document was purposely written in a vague manner so it wouldn't be outdated as quickly as the previous WCAG 1.0 document.
Follow the POUR principles Identified in the WCAG 2.0 DocumentBecause the WCAG 2.0 document can be a bit vague, Lynch suggests you follow the four principles of making web content accessible (the POUR principles). Each piece of content needs to be Perceivable, Operable, Understandable, and Robust.
- Perceivable: Information must not be invisible to all of the user's senses.
- Operable: The interface cannot require actions that users cannot perform.
- Understandable: Users must be able to understand the information and operation of components.
- Robust: The content and interface must be understandable, operable, and accessible through widely-available technologies and remain so as technologies advance.
Start your design with the end in mind. Consider accessibility when designing your web site's navigation, icons, and clarity. You can test the clarity of your site by doing a simple squint test to make sure it can be navigated by people with vision problems.
Use a Zen ThemeUsing Zen themes as a starting point for your theme, is a big step in the right direction. The principles behind the Zen themes and accessibility have a lot of overlap.
Make Your Theme Colors AccessibleTo make your web site accessible to user's with color blindness you need to make sure there is at least a 3:1 color contrast ratio, don't use color only cues, and make sure link colors are as standardized as possible.
Always Use Alt Text on your ImagesAlt text is used by screen readers and mobile devices. All content images should have a descriptive alt text and all decorative images should have empty alt text which will be ignored by screen readers.
Use Skip LinksSkip links are useful for helping users with screen readers skip over repeated chunks of content on your web site.
Follow Styling StandardsUse header tags (H1, H2, etc); use labels, fieldsets, and a legend for your forms; and use scope, ids and headers, and the summary attribute for your tables.
Consider Layout/Source OrderScreen readers can only read content linearly. Keep that in mind when designing the layout of your website and considering what you want users to read first.
Accessibility and Drupal Module Development Create Accessible Admin InterfacesIf you have a site with user generated content, your administrative interface should abide by the same accessibility principles as your front end interface.
Do Not Require an Action That Not Everyone Can DoAvoid keyboard traps where you cannot navigate a site with a keyboard, or you can get somewhere using a keyboard but you can't use a keyboard to get out. Don't override the tab, arrow, or esc keys, unless you let the user know very early in the HTML that they are being overridden. Actions should be triggered "on click" or "on focus," not on "mouse down" or "key down." Also, never trigger a form submit using "on focus."
Provide Accessible Alternatives for Audio, Video, and Synchronized MediaAudio should have a text only transcript along with the capability to turn audio up/down off/on. Your site should not auto play audio. Video should have a text or audio equivalent. Synchronized media (both video and audio) should have open/closed captions or a descriptive audio track.
A couple of modules that work with captions are the Media and SWFTools Modules for Drupal 7.
ConclusionThis session was full of useful tips to get one started on making their site more accessible to web users with disabilities. I hope this was as informative to you as it was to me. If not, please watch Katherine's full session online and hopefully you will change your mind. Thanks for reading.
Hoverable: A New WordPress Plugin for the Open-Source Community
One of our goals at Forum One is to help our clients achieve a higher level of collaboration. The effective exchange of knowledge is one of the hurdles separating a good organization (or open-source project) from a great one.
An open-source project is only as good as its underlying community. This includes supporting code contributors and helping others in communications channels such as forums and IRC. It also includes listening to those who toss in the occasional idea or constructive criticism. Since we use open-source tools on a daily basis, it is in our own interest to make sure that they continue to innovate and improve over time.
We're no stranger to relying on open-source tools, and it only makes sense to contribute back to the respective communities whenever possible. We've used Drupal and WordPress for dozens of sites thus far, and each site uses plugins or modules that have been contributed by some third-party entity. Without generous community members, Drupal wouldn't have CCK, Views, Panels, and many other essential modules.
In this spirit, Forum One is ramping up our efforts toward contributing back to the Drupal and WordPress communities. For starters, I have added Hoverable to the WordPress.org plugin repository. Hoverable allows you to enter a search string within a blog post or page, and enter some context data for that search string. Whenever a match is found, it will link the search string to a lightbox window, containing the context.
For example, you can use it to show the user a glossary term on a given page, as is done on this page for the new site HealthReformGPS.org. For example, on this page, if you click on a term such as "Exchanges" — or any word with a dashed underline, the term's definition appears in a lightboxed window.
Here's a look behind the scenes. Below is the section of the administrator's interface for Hoverable. The top field includes the search term "start blogging." Its respective context is in the second field.
Below is how it looks on the public side. When the user clicks "start blogging," a lightbox window appears to show the attached context:
Hoverable is a small contribution to the WordPress community, but one we expect will be valued by many.
Mobile Application Development Using PhoneGap
After attending last week's PhoneGap training session in Washington, DC I have finally made that frightening step into mobile application development. Why was it frightening? Well, it was the expectation that I needed to face another steep learning curve just to get that infamous "Hello World" text to display on my cell phone when I rather be spending my time making my lawn look better than my neighbors.The irony of that is that growing grass and maintaining it requires a learning curve as well! So, I might as well invest my time in learning how to develop mobile applications to make more money so that I can rectify my lawn care mistakes.
So, you may be asking yourself, "What is PhoneGap?" It took me a minute to figure out its role in developing mobile applications but after working with it for the last few days I have boiled my understanding of it down to it being a JavaScript API which enables developers to access common mobile features such as contacts, camera, geo location, sound, etc. without having to worry about compatibility issues across mobile devices (iPhone, BlackBerry, Palm, Android). In a nutshell, I should be able to write an application once and use that same code across other devices.
Due to the fact that PhoneGap is fairly new and there are still some limitations or capabilities that haven't been added or perfected, it was advised to pick a single mobile device to develop on. Once the application is fully operational and well tested, use that same application code to begin development for other platforms and debug it as you go from one platform to the next. Again, these compatibility issues are being resolved as PhoneGap evolves so dont abandon it as a tool just yet.
Now considering that we received Palm Pre's during PhoneGap's DC training workshop to develop on, I will use the Palm platform to demonstrate how to get your development environment setup to begin developing a mobile application.
Process OverviewHere is a rough outline of the steps involved in getting your development environment set up, creating a new project, packaging your application, and launching it on your mobile device.
- Download and install the current mobile SDK (software development kit) for the device in which you will developing.
- Download PhoneGap.
- Start a new project.
- Package and launch your project.
The software development kit is provided by the companies that have developed the mobile platforms (ie. Apple iPhone, Palm webOS, etc.). So depending on which platform you are developing on you will need to download the respective SDK.
I'm not going to rewrite this step as Palm has done an excellent job of providing instructions on how to setup their SDK. Butmake sure to read the instructions provided for the operating system that you will be developing on.
After installing the SDK, Palm's in this case, you will often have a set of commands available within your terminal or command line application that will allow you to perform certain operations to aid you in the development process. For Palm, their SDK includes the following:
- palm-generate: Generates basic applications and scenes.
- palm-package: Packages a Palm application before installation on the emulator or a device.
- palm-install: Installs a packaged application on the emulator or a device.
- palm-launch: Launches an installed application on the emulator or a device.
Read more about Palm's SDK command line tools.
I personally would encourage developers to get familar with these commands but it isn't necessary to get started though as PhoneGap has made it so that these commands are automated via a "make" file.
Step 2: Download a copy of PhoneGapPhoneGap has done a great job of providing instructions on how to download their product. I will however advocate for using "git" to obtain a copy of the code. You can learn how to install git and use git to get a copy of PhoneGap.
Step 3: Setting up a new ProjectIn order to create a new project, which is quite simple, you will need to open a terminal or command line application and navigate to the directory in which you have downloaded your PhoneGap files.
Within PhoneGap's set of instructions on downloading the PhoneGap code you will have the following directory structure once you've completed Step 2.
phonegap|- README.md
|- android/
|- blackberry/
|- iphone/
|- palm/
|- symbian.wrt/
Step 4: Packaging and launching your application
Within each mobile platform directory you will notice a Makefile in which you can execute by simply typing in the command, make:.
Here is a screen shot of me running the command, make:
This will package your application and install it on the first qualified device be it a USB-connected device or the virtual emulator. Please note that each mobile platform directory is a self containing application specific to each mobile platform ready to be packaged, installed, and executed without writing any code. These directories are nothing more than simple applications serving as a template for development.
Now, it's highly recommended that you test your applications on a USB-connected device as it will be faster to execute and interact with. Besides, what better testing environment could there be aside from an actual mobile device? Exciting, isn't if?
The emulator, on the other hand, tends to be slower which can become aggrevating especially when debugging an issue. If, however, you dont have a mobile device laying supporting the mobile platform that you are developing for then the emulator is your only option.
I hope that this post will help you along in getting started with using PhoneGap. I will be posting additional posts as I get deeper into developing mobile applications using PhoneGap so stay tuned.
If you have any questions please feel free to post a comment or visit Google Groups to discuss your questions with other PhoneGap developers. Good luck and happy development!!
Helpful ResourcesPhoneGap Homepage: Home Page
PhoneGap Google Groups: Developer Discussions
PhoneGap GetHub: Contains great instructions on using git to download PhoneGap project code.
Open Source Lifestream projects using PHP
Both of the lifestreaming projects discussed in this Ars article are both built using PHP frameworks, one Zend and the other is CodeIgniter. I was pleasantly surprised, I expected them to be Python or Rails projects.
In days of yore, all a self-respecting Internet enthusiast needed to be at the forefront of Web hipness was a simple vanity page with pictures of their dog and maybe a few moderately coherent rants about technology or politics. In the modern world of Web 2.0, where our thoughts and activities are strewn across the global network of tubes, we need slightly more sophisticated solutions to capture and convey the ethos of our Internet identities.
How to write a Drupal Panels style plugin
If you're looking to build your own style plugins for Drupal's Panels module, the following post will step you through how to do so. The functions and files you'll need to declare are a bit arcane and this beats stepping through the panels code yourself. I've found the rounded corners plugin that ships with panels is a good reference to review when you're stuck.
Creating panels styles can be very powerful. You can define certain styles for your client to choose from, so they can choose what type of display the panel pane will be like. This way you keep the workflow clean, your code under revision control, your themer gets to keep his sanity, and your concious stays clear.
How to create a panels style plugin | manuee – Putting the puzzle together.
