Posts
-
Efficient Speed
Yesterday, I was given an interesting problem to tackle.
We were given a bunch of laptops, 8 of them to be exact, already cloned but missing almost 15 GB of important user-end data. There’s no way to re-clone all these machines, as the source image is not available to us. The only way is to copy the 15 GB of files to each machine, no two ways about it. The 15 GB of files lie on a 500 GB external USB harddisk. I have Ethernet cables and 2 Ethernet switches.
The big question is how?
Of course, copying from the harddisk onto each laptop one after another, manually, or via Sneakernet, is the favourite answer, but no. I can only call that desperate, physically constrained, or intellectually apathetic.
I’m a person who loves processes, systems, and automation. Having to copy a bunch of files serially and manually, and onto so many computers repetitively is unacceptable, especially when you have to rinse and repeat a whole 8 times. Suffering a little pain to get any infrastructure up, just to let it copy automatically painlessly is what I’m looking for. 先苦后甜.
Out of ideas, I pinged a few people via sms, “Hey, what is the most efficient way to transfer 15 GB of data onto 8 different laptops, without cloning.”
Portable Harddisk / Sneakernet, Samba CIFS were the few answers that came in. Someone suggested copying from one to two, two to four, four to eight, but that’s too tedious and not scalable, equipment wise. But what if the media used is the Ethernet?
I probed further, “multicast network solutions?”
“BitTorrent”. Bingo. Thanks to cflee for that great suggestion! That’s the term and I knew it would certainly work. I did read up on the BitTorrent protocol some time back and am quite disappointed that this didn’t occur to me earlier. He also mentioned that uTorrent provides a built-in tracker, and that there’s a handy guide available.
Spent 10 minutes reading through and successfully managed to give it a trial within my home network between 2 computers. Conceptually, a prototype has been demonstrated and there’s no way it can fail the next day.
Spent the following morning with a few co-workers digging up rarely used networking equipment and proceeded to wire-up the machines. The two 4-port switch cum wireless APs were miserable — they only leave us with 6 usable LAN ports. The other 2 machines had to do with 802.11G wireless. It’ll work, but just a little slower. I was hoping to complete this whole ordeal before the day is to end, i.e. 5.30 pm, and go home on time. After all, copying 15 GB from the portable harddisk onto one of the laptops already took a grand total of 60 minutes. If I had to do this serially and linearly, it’ll take no less than 8 hours. Portable harddisks are rare too, especially for filesizes that huge.
I configured the DHCPd and got the whole network running nicely and proceeded to install uTorrent on all the machines (skipping the rubbish, ad-supported nonsense). That took hardly 10 minutes as Samba CIFS came into play. It’ll be cool if there’s a automatic install distributor but I’ve not got time for it.
Created the initial seeding torrent according to the guide and that process took almost 15 minutes. Thousands of tiny files, coupled with gigantic files, whatever you can imagine, the limits of the filesystem are being tested here.
Started the seed on the tracker, turned on ‘Initial Seeding’ while I distributed the newly created .torrent to the rest of the machines.
Changed back to standard Seeding once all the machines have entered the swarm.
Thinking about the 8 hours that I would have to take, going by the conventional advice, I grinned and went on to do other work, while giving my forecast of completion to ‘End of the Day’.
The seeding started at around 9 to 9.30 am. I drove out to buy breakfast for everyone and came back at around 10.30 am.
I took a peak at the progress and I got a shock of my life.
All the wired Ethernet clients are now seeding! 100% download complete! With only the 2 miserable wireless clients left struggling with the slow connection. I exchanged the wire and wireless connection with 2 other computers and I saw the download speed race to the roof.
12.2 MB/s. It works out to ~100 Mbps.
Every 30 seconds, the download speed will slow a little and a uTorrent would pop a warning at the status bar, “Harddisk overload 100%”. Wow, a solid harddisk LED.
I’m impressed.
Darned. I thought the transfer would take the whole day, giving me time for a well deserved break, but little did I know, the transfer had completed before I even had lunch!
So, now you know. BitTorrent is extremely efficient in one-to-many, many-to-many, and many-to-one distribution tasks. As long as the overhead of installing and running uTorrent on every machine is well distributed and / or paid for, this is an extremely useful piece of software to add into any sysadmin’s arsenal.
Some other hidden benefits of BitTorrent are that it is resumable, repairable, distributed (many to many, any seeder / peer can enter or leave the swarm without much disruption nor require any human rectification), lightweight (300k installer), and automated (once past the initial start, and handles disconnections gracefully).
Really, BitTorrent has its legitimate use as above, quod erat demonstrandum (Q.E.D.).
· · · -
Spanning Sync
I might have mentioned this before but I’ll do this again.
Spanning Sync is a awesome app at its 3rd version. What this application / service does is that it seamlessly syncs between your iCal and Google Calendar; Address Book and Google / Gmail Contacts.
As I sync all my personal mobile devices with Sync Services on my Mac, all my contacts and calendar entries flow from my iPod and mobile phone and to my Macbook automatically and beautifully, both ways. I have lived with this setup for a long time before my switch to Google Apps as I handled my mail on my own server.
With the dwarfing amounts of spam coming into my mailbox, to the tune of ~40 pieces a day (which works out to ~1000 pieces a month), my SpamAssassin setup was starting to fall behind. My once always clean Inbox started to fill up with false-negative spam pieces despite my dedicated training programme for the Bayes classification system. Having to spend a few minutes daily, looking at spam and repeatedly deleting rubbish is an inelegant chore; I bit the bullet and migrated over to the Google Apps hosted mail solution. Then, it was still in beta and there wasn’t a paid enterprise service yet, but the spam filtering system is top notch. Much better than Yahoo Mail and Hotmail combined. I had to admit, my SpamAssassin setup did a great job of filtering out more spam than Yahoo and Hotmail services then, but Gmail does spam management better.
With the move to Google Apps, all my contacts and events are no longer integrated into Gmail / Google’s interface. There were times where I clicked on the To: link and was looking forward to selecting a recipient from my Address Book, but I was disappointed. There is no link with my Mac OS X Address Book! Apple did come up with Gmail sync with the iPod Touch but it is still flaky.
In comes Spanning Sync and the missing link is solved. I now have my contacts and events, everywhere, updated and synchronised seamlessly. What elegance!
I did try out Spanning Sync previously, but then, the sync was flaky and duplicate entries in Google Calendar were quite common. 2 versions later, the algorithms became way more matured and I can trust it to do its job, with my hands off.
So, they’re currently running a promotion, giving out $5 promo codes and a $5 referral bonus. If by any case you do intend to get a subscription or a life-long license, use this link to save $5 or this promo code, KQR9TP, to be entered upon checkout. After which, for every friend you refer, you’ll get to save $5 also.
Full disclosure: I paid for a 1 year subscription to this service.
· · · -
Open Word
I spend much of my time, fixing broken documents sent my way, commonly due to users misusing Word Processing features, manually doing what the processor can do automatically. There are times where I see people spent no less than 2 hours, trying to get their document formatting into shape, just because they didn’t let automation do their jobs.
I shall list a few very common usage scenarios, how people do it the wrong way, and how it should be done, efficiently.
Paragraph Spacing
The first I’ll start off with is ‘paragraph spacing’. Ideally, after every press of the enter key, a gap is formed before the next line, as follows:-
However, most people would just press enter twice, producing the ‘same’ effect. This will create consistency issues across a huge document and unsightly gaps at the top of the page would be seen when an empty paragraph spills over to a new page. Then, we’ll see the user, manually and arbitrarily deleting the empty paragraphs to work-around this problem.
The ideal solution would be to use the ‘paragraph spacing’ configuration parameters under the Paragraph Formatting dialogue.
Indentation
The next significant time waster is paragraph indents. Users would want to move their paragraph in and spam their tab key at every protruding sentence. This is where a disaster awaits when the paragraph is amended.
Here’s what I mean:-
The best practice way to do that nicely is to use the indent buttons:-
Keeping with the Next Paragraph
Sometimes, certain paragraphs, such as headings, list introducers, and titles, must logically follow the next paragraph and cannot be the last item before text flows off to the next page. Users tend to work-around this by creating unnecessary empty paragraphs (by pressing enter) right before the header, hoping to push it down to the next page. This might seem to work, but when text above is amended, stray blank paragraphs will start to pollute the document, increasing the amount of clean-up one needs to do before printing.
My recommended solution is to enable the following option (under Format -> Paragraph) for all headings and titles:-
This way, the headings and titles will stay on the same page as the next paragraph. This can also be applied to table headings if you don’t want your first header row to be alone at the end of a page.
Styles
If you’re a web designer, think of Styles as cascading styles sheets for your text documents. They inherit their parent styles and helps unify the formatting structure of your document with less clicks. In Microsoft Word, the ‘Normal’ style is used as your baseline style which all body text in your document would take dressing from to have a default look; it is intuitively named ‘Body Text’ in OpenOffice Writer.
Then, there are ‘Heading 1’, ‘Heading 2’ and all other levels of headings you’ll need in a generic and simple document.
However, if you happen to create the documents that I have a hand in, you might have to deal with numbered paragraphs with alternatively numbered child paragraphs. This way, the Bullets and Numbering feature can work hand-in-hand with the Styles capability of the word processor to relieve you of tedious formatting.
All the paragraph numbering and sub-paragraph numbering is automatically generated. The only buttons you need to click on is the indents button which assigns the paragraph a particular ‘level’. The exact details on how to set this style up is left as an exercise for the reader. I would leave the following screen shots below as hints.
This concludes my brief gripe about word processor misuse. In a future post, I might talk about version control and document management.
· · · -
Aardvark
There’s this new social service concept that is quite interesting and it’s known as Aardvark. It is basically a question to answer matchmaker. You ask Aardvark a question and it sends your question to people within your social network (facebook, etc) that are on Aardvark. The people your question gets to has matching interests with your question, allowing the answers to your questions more focused and appropriate.
Schematically:-
I think this service has great potential and utility. Check it out!
· · ·