David Sterry's Blog


Tuesday, January 31, 2006

My own Google

I'm writing my own search engine. I know it sounds crazy but it's just the sort of project that fuels my fire. At this point, I'm keeping the scope of the search to urls that have appeared on my Trend Sweet Trend page currently numbering (>7000). I've written 3 perl scripts so far and setup 5 tables on my newly LAMP'd web server.

One table stores the urls and the last time they were indexed. Another is a queue of urls to be indexed. A third stores the parsed text from the indexing operation. The other two aren't storing anything yet but will be for the cache of each page and to keep track of links between sites once I unleash a crawling feature.

As for the three scripts, one grabs the urls from my Trend Sweet Trend database and puts them in my search engine's crawling table. The second checks the last time the sites in the url table were indexed and adds urls that need indexing to the queue. Finally the third script uses curl to visit each page, parse out text using HTML::Parser and stores the counts of each term in my search terms table. It's quite primitive, indexes a lot of garbage text, and can't do any booleans but besides that it's great. Watch out Google. Anybody got a spare datacenter with 100000 commodity boxes?

Sunday, January 29, 2006

Linux tribulations

Today was a big day for me and Linux. I started the day with a Linux web server with PHP but no MySQL. That's a LAP server for all you techies. Pretty decent for serving static pages but not too great when it comes to doing anything with data.

Never had MySQL working on this thing so I did a "yum install MySQL-server" to install MySQL. Then it was time to start this server up! Nothing. Well, not nothing. I got an error and a message that MySQL ended. Did a little searching and found a thread on this problem recommending that SELinux was the problem. It was keeping MySQL from accessing it's database storage files. I used the command "setenforce 0" and all was well. I'm not sure if that's a permanent change but hopefully it is. (A tip I got in my chatroom was to use "tail -f /path/to/logfile" to watch a file when something's going wrong. It shows you a live view of the log so you'll want to open it in a separate terminal)

Once I had the MySQL server running, I set a new password and I thought I'd install phpmyadmin to see if I could control it using the web interface I'm so familiar with. Not so fast bucko. PhpMyAdmin complained that MySQL support was not available to PHP! I created a test script to show me phpinfo() and then saw that PHP was compiled with --with-MySQL=shared,/usr

That's supposed to mean that MySQL support is loaded as an extension to PHP but I could not find a suitable extension on my system. So I thought, I'll just compile PHP. I downloaded the latest stable source for PHP4, compiled it, configured it and tested my PhpMyAdmin again. Nothing. Seems PHP was compiled into apache so my changes really didn't take effect.

Finally, I decided to compile apache and PHP from source. Apache would be compiled to load PHP as a module and PHP would be compiled with built-in support for MySQL. I downloaded and compiled apache 2 from source. Once both of those were built and installed, I manually added lines to the httpd.conf to activate PHP. At this point, things were running well. I had a LAMP server!

I could use PhpMyAdmin but unfortunately so could anyone else with access to my webserver.
The solution to that: add .htaccess to the PhpMyAdmin folder so a username and password are required to get in there. It's been a great learning experience and I kind of want to do it again on another box to see just how much of what I've learned applies to my next LAMP setup.

Friday, January 20, 2006

Collab.nl

Ran across a pretty interesting chat service at Collab.nl

It's a flash chat application geared in some way toward multi-user development. All I saw there was people chatting and having a good time. If you've got a mic and webcam you can go there and get your chat on.

This app kindof puts AJAX in perspective against something like Flash. At this time there's really no way to do in javascript what Collab.nl does with Flash. Javascript would need streaming video libraries coded native to the os and a fast server component similar to the Flash Media Server.

Macromedia is in a really great position since lots of video content(like YouTube and GoogleVideo) is being distributed through Flash. In addition all of the rich-client apps that people are doing with AJAX can really be done better with the Flash platform. They'll be prettier, they'll work faster, and they'll be able to be ... umm ... well ... richer. I'm talking video and sound to go with your non-reloading AJAX form page.

So Flash seems to be the future. How can we release and extend it? We need to take the kernel of the Flash idea, build upon it, and explode it. Move as much machine code to the client as possible. That's really what flash does. Integrate into the browser, the ability to download and run os-specific, optimized, multi-media code. Require the code to be signed and people can start building trust with developers of this new platform. Any ideas?

Thursday, January 19, 2006

Camp SysAdmin 2006

I attended Camp SysAdmin on Saturday in the SOMA area of San Francisco. It was fun listening to and participating in discussions about IT over the two sessions. Each session was an hour long and they were seperated by a pretty decent lunch.

The first session I went to was about how to intermingle open source and proprietary software in the enterprise. Brian Akers, Director of Architecture at MSQL guided the conversation. The conversation evolved into a discussion of how best to use open source software and how to influence its development by getting your changes made and accepted into the project.

Brian made the following suggestions: that open source apps are best used like commercial apps. Don't change them just because you have the source code. They are generally well designed so use them as they're intended. That way you'll get the most bang for your buck and less headaches later.

If you want changes made or features added to the project try to get someone with commit priviledges on the project to make them. If they don't want to do the changes, try explaining just why said feature would be useful to you and others. If the project leader/owner doesn't have time to make the changes, they'll probably have a list of contributors who would have time to work for you. Think about helping with the project by providing documentation, code, servers, etc.

The second conversation I sat in on was about troubleshooting messaging. It was led by Eric Allman of Sendmail. He originally wrote Sendmail back in the day and was a great resource to have attend the meeting. He kicked off the meeting with: What sorts of problems do you see with your email systems?

The top problems that came up: large attachments, exchange server message store limitations, bandwidth problems, difficulty of sifting through logs and following transactions through such systems as spam assassin, sendmail, and graylisting.

Eric mentioned that the Domain Keys system may help curb spam by helping clients to authenticate the sender of any message. Once the sender is authenticated, some sort of reputation system can come in to help with the final decision to accept a message. There's an IETF group forming to help this technology along.

Overall the Camp Sysadmin was a worthwhile event. Big ups to Splunk - the primary sponsor of the camp - to the other sponsors and to everyone who came out to make it what it was.

Friday, January 13, 2006

Chatr 0.5 released

Well, I've just released the next version and the changes are numerous. It's starting to even look like a real chatroom!

As the project gains capabilities it's also become more complex. This is the first release where I merged a contributor's code so I tried my hand at using diff and patch to do the job. Since I do my development on windows, I downloaded unxutils from sourceforge to do the job. Then after a little python scripting, I had a workable diff and patch system.

One feature I added to this version is the ability to see when the chatroom has some new conversation while the window is minimized. The way I do it is by using md4 hashes and checking if the hash changes since the last time the chat window had focus. I think this is a bit on the processor intensive side so I'm looking for a faster hash function to keep the client running fast.

I really appreciate all the feedback and I'm looking forward to comments on the new version. Oh yeah, it's now an official open source project using the BSD license. Feel free to make changes and send me your diffs.

Friday, January 06, 2006

Chatr developments

I'm continuing to work on Chatr, my AJAX/php chat system. The last couple of features I've worked on help usability and utility and will be included in 0.5 after I test it a bit more.

First, the system can now email the address of the adminstrator's choice when someone comes in the room. For low volume sites, this can function like a chat pager if you set it to email to your cellphone. For high volumes sites, this could be annoying but fun all the same as you announce to those around you: "Somebody just entered my chat room!"

Second, most chat and IM software have a way to notify you if a new message has been posted while the window is minimized. You might hear a sound or see the taskbar icon flash. I've updated the javascript client so now you'll see "***" in the taskbar icon(and titlebar) if someone speaks, joins, or time's out.

Finally, 0.4 brought an idling mode on the javascript client that helped to reduce server load by reducing the number of requests the client would make after 2 mins of no posting. I've changed that so now, the client comes out of idle as soon as they start typing. It just makes for more responsive chat.

I'm getting a lot of great suggestions from people who are trying out the chat. In this case, eating my own dog food is a blast!