Sunday, January 13, 2013

Are Internet mailing lists a dying breed?

That is the question I've been asking myself lately. Mailing lists used to be a staple communication mechanism on the Internet. Now nearly all of the mailing lists I'm subscribed to are very quiet - still have 10,000+ subscribers on most, just no one seems to use them. The distinct trend I am noticing is that people are forgoing mailing lists and using quick question and answer sites like Yahoo! Answers and StackOverflow to get the answer to their questions. (Or using the Facebook commenting system or Twitter - but that depends on your friends, connections, and followers). Experts Exchange used to hold the position and used to do quite well but then shot themselves in the foot by putting annoying barriers in the way. People went back to mailing lists after that fiasco.

Unfortunately, there are two significant problems with the Q&A websites out there that mailing lists solve and StackOverflow is demonstrating the problems quite well.

The first problem is community and ownership. On a mailing list, anonymity is possible but you don't get to be a community leader let alone the mailing list owner by being anonymous. About half of the StackOverflow questions get shut down before they get started by moderators who hide behind rather anonymous-sounding usernames. Yahoo! Answers would have suffered a similar fate had spammers not found it to be an effective medium. StackOverflow/StackExchange is experiencing growing pains by leaps and bounds but is suffering the Wikipedia effect, where there is significant power in the hands of few people who aren't vetted very well and repeatedly show up to cause long-term damage to the site. The current set of SO moderators police the site and abuse their power by shutting down valid questions after only being up for a few minutes. On the other side of things, mailing lists are generally open to everyone to ask questions and any moderation queues are used to just filter spam from reaching list members. The current SO strategy will ultimately kill the site in the long run and significant damage has already been done, but most people don't realize it yet. There is also the question of who owns the site. Ownership is important because it creates the important hierarchy of accountability. StackOverflow, from the observer perspective, appears to have no owners, which makes it seem like a free-for-all website. That also causes problems that are a lot harder to pin down in a single sentence but suffice it to say that where obvious ownership exists, chaos, which always exists, is better kept under control.

The second problem is continuity and continual learning. StackOverflow, Yahoo! Answers, forums, and other mediums are hit-and-run. You ask your question, you get your answer from someone, you give them karma/points/whatever, and you go away and generally forget that there are other people who need help. These sites imbue selfishness. Whereas a mailing list is a continual stream of thoughts - there are regulars but other people help out too as part of a continual community effort to improve each other. Everyone picks up tidbits of information here and there and refines knowledge in a common area as well as occasionally replying to posts, which further contributes to the stream of thoughts.

Now I'm going to give one downside to mailing lists that StackOverflow really excels at: SO excels at bringing together a collection of strategies for software development and selecting the best approach at the point in time that the answer is selected. It is basically a Wikipedia for common software development questions and answers and, specifically, produces a set of best practices that are impossible to obtain elsewhere in a single location. Which is why the site, if it doesn't change, will suffer the same fate as Wikipedia, only it is more deviously hidden. In fact, we can already see SO turning into a Wiki where the answers can be edited by other people.

This puts me in a bit of a dilemma: Should we use mailing lists for our questions and answers? Should we use StackOverflow/StackExchange/Yahoo! Answers/forums? Twitter/Facebook? It would be nice if we could somehow have the best of all of these worlds. This seems to be what we have been trying to strive for over the years of IRC, mailing lists, and Q&A websites: Hey, I asked a question that someone else may have gotten a great answer for, so I should use that, but I also want the personal touch rather than "Closed as Duplicate/Too Localized/etc." without any interaction by those closing the question. (Side note - "Too Localized" is irritating because it comes off as "You asked a Dumb Question, go away" - imagine how that would make you feel being the recipient of that.) Closure of questions is the equivalent of "This conversation is over because we decided it is over and there is no disputing our decision." Humans have emotions and desire interaction. Therefore, canned responses and question closures are too stoic and drive people away. Okay, so you've repeated yourself a zillion times already and it is kind of boring to do it again, but to someone else, it is that direct response that says, "You are important so I won't brush you off as a nobody."

If we can achieve the above while simultaneously having an effective database of best practices, it won't matter if mailing lists died. I'll be sad that I can't simply use my desktop e-mail client for community communication, but I'll move on too.

Saturday, January 05, 2013

Setting up MySQL + Postfix + Dovecot to do Gmail-like 'youremail+whatever@domain.com' plus multiple delimiters

When I set up my Postfix + Dovecot + MySQL installation, I wanted GMail-like filtering for my domain. GMail allows you to do 'youremail+whatever@gmail.com' and it will automatically be delivered to 'youremail@gmail.com' with the label 'whatever'. From there, it can be filtered into the folder of your choice. I figured something similar would be very useful when registering on websites where I'm not necessarily wanting to use a mailinator address but do want to track whether they sell my e-mail address or not. Well, I thought doing the same thing would be useful, but more on the difficulties I've encountered with special characters like '+' in a bit.

In MySQL, I have 'virtual_aliases' and 'virtual_users' tables. If I remember correctly, Postfix first attempts to find an e-mail address in users, then aliases, then tries again without the extension specified by 'recipient_delimiter'.

To set up Postfix with a 'recipient_delimiter', open up '/etc/postfix/main.cf' and add:

recipient_delimiter = +

Save the file. I seem to recall that 'recipient_delimiter' only works with aliases, not mailboxes, so that might explain why I had to add the 'virtual_mailbox_maps' value to the 'virtual_alias_maps' line. Or maybe I'm confused. I set all of this part up almost two years ago, so the memory's a bit rusty because I've been busy working on my software products.

Now open up '/etc/postfix/master.cf' and change the dovecot delivery line from '-d ${recipient}' to '-d ${user}@${nexthop}'. Here are the full lines:

dovecot unix - n n - - pipe
flags=ODRhu user=vmail:vmail null_sender= argv=/usr/lib/dovecot/deliver -c /etc/dovecot/dovecot.conf -f ${sender} -d ${user}@${nexthop}

Save the file. Run the whole 'postfix reload' thing. At this point, Postfix and Dovecot will handle e-mail just like GMail does. However, as I said earlier, the '+' symbol doesn't work very well. You'll quickly discover, as I have, that there are a lot of broken web forms out there that won't accept it because web developers use regular expressions to "validate" e-mail addresses. E-mail addresses are so complex that regular expressions don't cut it. Since there are a billion different regexes out there for doing broken "validation", I wanted to use "multiple recipient delimiters" with my Postfix + MySQL setup. Scouring Google, I found exactly one post that started to deal with this issue, but only barely. Their solution was to create a new MySQL-based 'virtual_alias_maps' entry that runs the query:

select concat(replace(left('%s', length('%s') - instr(reverse('%s'), '@')), '_', '+'), '@', reverse(substring_index(reverse('%s'), '@', 1))) "goto" from domain where name = reverse(substring_index(reverse('%s'), '@', 1)) and active = 1 and instr(left('%s', length('%s') - instr(reverse('%s'), '@')), '_') > 0

Yeah, my eyes bleed too. It is an interesting approach but they are replacing every '_' character with a '+' character such that 'my_user_name_extension@domain.com' becomes 'my+user+name+extension@domain.com' and Postfix will then look up 'my@domain.com' and not find anything when it should have looked up 'my_user_name@domain.com'. It also risks potential delivery issues to random addresses depending on how Dovecot is set up. It seems that a better approach would be to first ignore any address that already contains a '+' symbol, since Postfix handles that natively. Then, make sure the address part contains the current symbol. Finally, locate the first instance of the current symbol and construct a new string excluding everything after it up to the domain part and look THAT address up in the appropriate table. This approach will work with small and large databases:

SELECT dest FROM virtual_aliases WHERE INSTR('%s', '+') = 0 AND INSTR('%s', '@') > 0 AND INSTR(LEFT('%s', CHAR_LENGTH('%s') - INSTR(REVERSE('%s'), '@')), '_') > 0 AND source = CONCAT(SUBSTRING_INDEX(LEFT('%s', CHAR_LENGTH('%s') - INSTR(REVERSE('%s'), '@')), '_', 1), '@', REVERSE(SUBSTRING_INDEX(REVERSE('%s'), '@', 1))) AND active = 1

MY EYES! THEY BLEEEEED! I feel like I've done an injustice to the Internets after writing that. Let's break it down. Basically, it returns a 'dest' (this is the aliases table after all) if it finds a matching 'source' based on string manipulation that strips the first match of a specific symbol plus the extension. In this case, it is looking for the '_' character. But, before it even does a lookup, it verifies that the string being searched for doesn't contain a '+' and does contain both a '@' and a '_' for sanity checking purposes. MySQL will resolve the INSTR() and CONCAT() mess first before doing anything else. Then it will start looking at the database table. If there is a MySQL index on the 'source' column, then this will be a super-fast, index-based lookup regardless of database size. But good grief, look at that awful mess again! I don't think Postfix will let me have that on multiple lines to make it potentially readable.

Okay, now let's assume we want some additional flexibility with '_'. Let's say that we want to remove everything after the last underscore instead of the first underscore (e.g. so 'my_user_name_extension' becomes 'my_user_name' instead of just 'my'). Postfix will execute things in order and stop when it gets a non-empty response, so the above query could be run first and then this can be run second to catch this new special case:

SELECT dest FROM virtual_aliases WHERE INSTR('%s', '+') = 0 AND INSTR('%s', '@') > 0 AND INSTR(LEFT('%s', CHAR_LENGTH('%s') - INSTR(REVERSE('%s'), '@')), '_') > 0 AND source = CONCAT(REVERSE(SUBSTRING_INDEX(REVERSE(LEFT('%s', CHAR_LENGTH('%s') - INSTR(REVERSE('%s'), '@'))), '_', 1)), '@', REVERSE(SUBSTRING_INDEX(REVERSE('%s'), '@', 1))) AND active = 1

If you don't see the difference, look for the additional REVERSE() calls. Okay, the above might not actually work since I didn't try it out and I think the SUBSTRING_INDEX() call will pull off the part I want to remove instead of getting rid of it. I'm using the first query, but not this one. My head hurts from looking at this mess. If you fix the second query, let me know. It might just be better for 'my_user_name' to use a different delimiter such as a hyphen and then have a '-' based query because doing this could potentially introduce some weird issues.

Each query has to be in its own file and you'll want to check the 'users' table as well. So, to do both queries above for each character for both tables, you'll need four distinct files on the file system. This gets to be rather messy if you want to do a combination of characters such as '+', '_', '-', and '.' which results in a lot of distinct files (and potentially lots of distinct MySQL connections?). I suppose you could make one file and use OR with the query but you risk losing index functionality (the OR keyword tends to have the effect of doing full table scans) and the SQL query would become a vomitastic monstrosity. Actually, if each file means a separate database connection, it might be faster to not worry about the index. I honestly don't know which will be faster. I assume indexes will be faster and that Postfix intelligently maintains connections to MySQL, but I don't know and I don't really want to go digging around source code to find out. Postfix runs a ton of SQL queries per e-mail already but it is preferable to run a lot of fast queries instead of a few slow ones. At any rate, make it work first with semi-readability, worry about performance issues later. Don't forget to 'postfix reload' and be sure to check file permissions so that Postfix can read the file.

If you do use this query, please document the file(s) so someone (including you) viewing the query later doesn't have a "what in the..." head-scratching moment.

Disclaimers: I have never applied this many MySQL functions to a single query before. My knowledge of MySQL functions is fairly limited. I call CHAR_LENGTH() because it seems more Unicode-friendly than just plain LENGTH(). I also use the specific table for aliases instead of the domain as in the original example because, looking elsewhere on this topic, I've learned that Postfix likes 'specific' rather than 'general' definitions and not creating a potential exploitable delivery vulnerability in the system is important. I blame the authors of Postfix for this major oversight in their product that it requires a hack in MySQL to fix a problem that is clearly within their domain and capability to fix by simply allowing multiple delimiters. I'm also not responsible for any heart attack, stroke, stress, panic, heart palpitations, sweating, shock, bleeding from the eyes, weight gain, weight loss, hair loss, mutation, rage quit, and other medical disorders as a result of this blog post.