A Letter to James

Dear Mr. Conley,

I regret to inform you and all of your apple love that I have purchased one of the new 15″ Macbook Pro laptops. You are probably thinking that this is fantastic news; however, it means that I have purchased a laptop for the sheer price to spec sheet. Which additionally means that I will be putting a copy of windows 7 on the laptop immediately. It’s actually currently going through the process as we speak. Finally, you are not going to be pleased because this means that I will most likely be talking some serious apple smack with the new device. If only the keyboard wasn’t so majestic…

picture of the WinBookPro

retina macbook pro being loaded with windows 7

I once owned a laptop under the branding Macbook Pro. It was a fantastic laptop. Well built, fast, and the keyboard is the best keyboard I have used on a laptop to date. It was purchased in 2010 for $2700 dollars and sold in 2013 for $600. The $600 was then used to purchase an atrocity commonly referred to as a “Surface Pro.” This was a mistake.

In 2011 I swore to never purchase another Apple product. A partial mix with the reasoning of Apple has a poor operating system and I oppose the business operations in the assembly plants. These reasonings have been disregarded as the latter would force me to never buy 90% technological devices again. The OS is still garbage, but damn this hardware is beautiful…

In summary, I will try to do my best to not be a belligerent Apple hating fool over twitter and various other social networks. If you would like, when tweeting pictures of my new beautiful piece of technology, I can include a hashtag along the lines of #dontLookJamesItsAWinBookPro. Let me know if this precaution is required.

Sincerely,

Your friend Caleb

<insert poorly written signature here>

Startup Weekend 2013 Lessons

Startup Weekend Kamloops_580x390

Bring a decent machine to work on.

I attended the even with a surface pro loaded with Ubuntu and wi-fi dongle. Not an ideal solution when attempting to create a prototype to show the judges and the crowd. It didn’t help that when the power was plugged in the touch pad would send the cursor jumping across the screen even when my finger would not move. I learned later that the if I pushed down on the top left corner of the device this would correct the issue leading my to believe it was an issue with the device being properly grounded. On top of all of these issues I was working in vim for everything, an un-configured copy of vim and the keyboard in use lacked an insert key making pasting a rather annoying procedure that involved pushing down the left side of the tablet while right clicking to past.

Lord of the flies, the sequel…

When you hop into a group of people where everyone is a stranger of everyone, two things happen. The first thing that happens is there is an immediate and unnecessary power struggle between some of the people in the group. I found that a good cure for this is reminding people that there is a group leader followed by asking what the group leader thinks of the idea. Side note, it’s amazing how much buying you can get by asking people the correct questions. Never demand, simply let them find them understand your reasoning by asking the correct questions.

Come prepared.

If you want to have a really good time at a startup weekend make sure that you do some download preparation. Things like downloading your repo code ahead of time and making sure that you have all of your android SDKs up to date will make things magical when you don’t know what the internet will be like. The first 3 or so hours were spent just getting everything in line for one of the developers on the team. This could have been easily handled with some quick prep.

You aren’t allowed to pre code something before the event but you are allowed to hack something together with existing frameworks and libraries.

Book the next Monday off

I quickly learned that after spending 30 plus hours of solid development time in a 2 and a half day window is really not as enjoyable if you were to do the same thing at home. The added stress of making sure that what you contribute is done in a timely manner and you aren’t letting anyone down can take it’s toll on you when it’s 6:30 in the morning and you still haven’t figured out how to efficiently query a database of latitude and longitude coordinates within an N kilometre range.

Also, because I live a fair distance away and was sleep deprived I ended up missing the after party which would have been great to attend.

Commend efforts not just results

It is amazing what people can accomplish in a few days and even if there are less than ideal results it is important to make sure that they feel good about themselves. Startup Weekend in my opinion is all about learning new things, helping people out and connecting with people that you would not normally meet. It is important to give back and help out creative communities and part of this involves recognizing peoples feats and giving them encouragement to continue.

On a final note

Fake everything. Fake the presentation, fake the product, pad the numbers and make it look good.

We created an actual application, it worked on the back-end. Connected to the servers and handled the information coming back through the wire. We created a prototype not an MVP. A minimal viable product should be as simple as possible and just for demonstration purposes. It doesn’t need to work, it just needs to look like it does.

APIs and Standards

Recently I have been doing a lot of work with APIs. I have so far being tied into youtube, flickr, wordnik, isohunt and a few other rather big ones. I love putting together code and having it just work. I also love working with external services and having proper and clean documentation laid out in front of me.

The reason I am creating this post is primarily because of the lack of standards I have seen when it comes to documentation. So far the best documentation has been from Youtube and Wordnik and the worst was from Isohunt. It was really just a form post with an example call.

Flicker had some issues for me when it came to documentation because in my eyes a web service should have four main requirements listed on each of the calls page.

  • URL and method required to connect
  • Example request
  • Example response
  • Parameters with flags for optional or required

All of these things are rather simple and are the core requirements when it comes to creating a rest service call. All of the following should be displayed. It also helps to show response types as xml, json or jsonp when displaying examples.

I have run across two rather annoying things when working with these various APIs. First is that Yahoo needs to show that you can request json by adding a parameter to the query on the API’s method page. The second issue that I came across was the the YouTube object variable names.

The YouTube Api, in their most recent version, has used dollar signs in the variable name… This is a huge pain when working in php because it breaks the variable and causes a runtime error. The solution to get around this is rather simple but still, it should not be required to call an object property in a rather ‘hacky’ manner.

$result->{‘media$item’}->{‘$t’}

The only other complaint that I have, which you should correct in the next version of your API, is that variables need to be consistently named. This was an annoyance when working with the Isohunt API. All of the variables started with a lower-case first character except for one. I don’t care if your variables have been concatenated with an underscore or use camel case, just make them consistent please.

die(“EOR: End of rant.”);

Particles: Isometrics

Currently the only real work that has been going on is updates to the crawler and getting all of the resources figured out. I have set aside a little bit of time to work on particles. Particles is a javascript engine that can easily be added into a phonegap build.

I have gone on in previous posts about the projects and the intentions I have with it. If you would like to read more about it then I suggest that you go here: Shameless project plug

I have started working on an isometric view that will allow scaling and in the future panning. All of this requires me to build an overhead grid that the isometric view and renderer will plug into.

If you are interested in the project then I suggest you fork a copy or get in contact with me. All of the information can be found on the repository wiki pages.

Forgotten Crawler Updates: API

I forgot to mention this in the last post and seeing as it is a pretty big update that was added I figured that I should mention it. I have created an API to allow specified people access to the crawler and the database that drives the search technology. This will allow selected people to perform searches, add items to the queue, get host information by searching and so much more.

Because this api is pretty locked down and the other do not have access to it I am offering a limited version that I am working on to the outside world. It is not currently finished but if you would like to take a look at it send an email to caleb [dot] jonasson at gmail and make the subject “lirkrawler api access” It will help if you tell me why you are interested in getting your hands on the api.

Note that if people start abusing the system or performing large database hits with insane limits I am just going to turn off their access since this project is currently on a home server.

Web Crawler Update: Hardware, Storage, Threading and The Intelligent Queue

A lot has changed with the web crawler since I last posted an update. Even though the updates have been scarce and few the project has had nightly revision changes even if I am swamped with other projects; there is always time for the crawler. In this post I will be talking about four main updates; hardware, storage, threading, and an intelligent queue.

Hardware

The main upgrade was to the hardware of the server. Upon discovering that the key efficiency was consistently less than 95% I had to think of a way to either make it more efficient or find a way to increase the buffer size. The solution to this was backing up the database and moving it off of the server. Once this was done I nuked mysql from the server and made that box a dedicated apache box. Then came the fun of ordering a new mysql server. The new box contains 2x 2TB hdd that are currently in raid 1. The box also has 32GB of memory; which in theory should get rid of my key efficiency issues. Now it was time to move the database onto the new box and get things rolling again.

Storage

I realized that how I was storing the information was unrealistic; a colleague of mine recommended that I store the pages in a file system rather than in a database  but this didn’t really appeal to me. I would rather parse the page line by line and store a linked page than move to a file system. The solution for now is to hash the pages and make sure that I was not storing the same page twice. The page nid is then linked back to the url that was crawled. This seams to have made everything much more efficient especially since while crawling websites with user generated content there are a lot of 404 pages or pages that display the same thing as another page that I may have crawled in the past. With twitter alone there were roughly 176,000 duplicate pages that were stored in the database and 780,000+ pages from amazon that were duplicates.

Threading

Probably the best thing that I added to the crawler was the pseudo threading process. Because the application is PHP based the crawling process starts on a heartbeat that occurs every 15 seconds. This heart beat kicks off multiple crawl scripts and thus the process begins. Before I had the threading enabled I had to set the heartbeat to about 5 minutes. This is because if the crawl took longer than expected and the database selects and inserts queued up the system would snowball and everything would cease to work efficiently and properly. Keep in mind that there wasn’t new mysql database at this time.

I created a table that handles instances of each crawl and allows handling of how many tasks are currently running. These threads can be stopped, started, and locked which allows for a simple way to turn off the crawler or if testing only turn on one thread at a time. Each of the threads contains information for which url it is currently crawling which destroys the chance of a duplicate crawl happening at the same time.

Because there is a finite number of simultaneous page crawls happening at one time I can be sure that the even if there is a queue on the mysql server it will never get out of hand. Also; any snowballing will be handled and thus it is impossible.

The Intelligent Queue

There is now a proper queue that is being used by the system now! This queue has some rules to it that allow for better crawling of the web. First; it can not exceed 30,000 urls. This is because I want selects from the table to be swift and I don’t want the queue to become to large. Another rule is that the queue can only be populated from the queue generator class. This class gets candidates based on some set rules that can be configured and changed. Currently it is favouring uncrawled home pages of websites, so websites with a Uri of ‘/’. After that is favours websites with a host that has a very high combined link back rate. Currently the highest website in link backs is twitter… No surprise there. The second highest is tumblr.

This process of generating the queue may not be the most efficient way of crawling the best content on the web and this is something that I have monitored and am aware of. Because it isn’t the best content on the web; mainly social media websites I put some limitations on adding x amount of the same host to the queue. After this was done I hard coded wikipedia’s host nid into the queue generation to make sure that I was getting some sane and pretty reliable content to crawl.

Shameless Project Plug

Git hub is where it is at these days and so is the mobile application market place. (or so I hear) I have been working diligently, 5 minutes a day, on a project called Particle. What was going to be a particle engine written in javascript turned into something much more than just a particle engine. It became a beast. A monster. A Javascript engine like all others.

The idea is to create an application framework for mobile application development using Javascript. I would like this project to tie into the all mighty Adobe-Apache Cordova Project, not officially of course, but enough hooks that will be configurable to easily be used with Phonegap.

Features

Particles will allow rapid development and prototyping of canvas based applications. Some of the features are:

  • Error logging through console and stored in the application as a stack allowing displaying of different levels.
  • Handlers that will allow different renders to the canvas. ie: Isometric, 2d, faux 3d, etc.
  • Easy set-up for browser application, or mobile.
  • Configurable resource loading.
  • Application state handling.
  • Dynamic canvas resizing.

Code Repository

All of the code for the application will be stored on Github. Which can be accessed here. If you would like to partake or use the code base feel free to use the fork although I am looking for people to work on the original code base.

Particles Code Repo

Sumsumsum.com updates

If you are following along with the recent posts of what has been going on around here then you will know that I have been on a bit of a javascript kick for the past few months. Because of this kick I have decided to start writing some lessons for those that are interested in learning javascript.

I plan on posting these lessons not only to http://sumsumsum.com but also to http://codewithdesign.com If you are interested in reading over these articles I will provide the links.

Along with the updates to sum3 I have been writing more articles for code with design.

Day 4 of development results.

The game thus far has gotten a little more complicated. The graphics are less than ideal but I have spoken to a man that knows graphics so hopefully that will get going soon enough. I worked on porting to Android last night and the whole process took about 2 hours which wasn’t bad. (time includes environment set up).

The problems I am currently facing are the following: the canvas on android is slow. I haven’t hit any issues yet and the game is running at 20 frames per second but I feel like this is going to lag with long lasting games so cleanup will need to be done on some of the objects in the game. The second problem is that I am using Cordova (formerly PhoneGap) which is an excellent product but it is not built for game development so there is a lot of un-necessary overhead which makes me want to do it all again in Java.

There is a demo version located here but it doesn’t include a pretty large list of features.

http://atomicbucket.com/games/units/v1/

The linked version is lacking randomly generated map, units unbalanced, specific logic, large list of bug fixes.

 

I will not be working on the game today but there is a very large list of things to do that will make this game a little more unique. One of which is sound…