By Stephen Shankland
November 2, 2009
Betting that the benefits of the move will outweigh the risks, Yahoo has released the source code underlying in-house software called Traffic Server that can speed up Web site operations.
The software works by moving some data and operations closer on the Internet to the people trying using those services. Yahoo released it as an "incubator" project under the auspices of the Apache Software Foundation, a seasoned organization for managing open-source projects and also the site that houses the Hadoop open-source project Yahoo favors for large-scale data-processing challenges.
Shelton Shugar, Yahoo's senior vice president of cloud computing, plans to announce the move at the Cloud Computing Expo in Santa Clara, Calif., on Tuesday in a keynote speech, but the software actually arrived at Apache last week.
"We've donated Traffic Server to Apache because we think it's a great piece of code, and we want to build a community around that in the same manner we built a community out of Hadoop," Shugar said in an interview.
Traffic Server is a battle-hardened package with more than 200,000 lines of C++ code. Yahoo originally got the software through its acquisition of Inktomi earlier this decade, and it's been using it ever since. Today, the software delivers 30 billion Web objects and 400 terabytes of data each day.
And Yahoo can rightly be proud of Traffic Server's performance: that comes from a surprisingly small number of Yahoo servers--between 100 and 150, said Chuck Neerdaels, vice president of data services at Yahoo. The software is set up particularly to run multiple tasks at the same time, a design well-suited to today's servers with multicore, multithreaded processors.
Source code is what humans write in a higher-level programming language; only after it's been translated into binary machine code can a computer actually run that program. When associated with an open-source project, this software is available for anyone to see, modify, and distribute, in contrast to the locked-down world of proprietary software such as Microsoft Windows. So in effect, Yahoo is allowing others not only to use Traffic Server for their own ends, but also to modify it--for example, by taking advantage of its ability at to accept plug-ins that can adapt it for different tasks.
Giving away the farm?
So isn't there a risk that Yahoo is giving away some pretty important technology that's central to its business? Plenty of start-ups today are trying to grow to Yahoo's scale, and many of them are competitors.
Some Yahoo rival might very well gain as a result, but on balance, the company thinks that it'll come out ahead. For one thing, Traffic Server in isolation is not as powerful as Traffic Server woven into Yahoo's computing fabric, the company argues.
"What we're giving up is a generic building block. What makes it really interesting at Yahoo is how we've connected it with other things to make a bigger service," Neerdaels said. As for Yahoo's major rivals: "We suspect our larger competitors already have some solution they're happy with."
Yahoo expects a number of benefits from broader development and use of Traffic Server.
"We think a lot of folks can benefit from this, and by raising the tide, we think we can benefit as well," Shugar said.
For one thing, making Traffic Server open-source software will mean that people will grow familiar in its use, making it easier for Yahoo to hire engineers who already are up to speed.
"By virtue of basing services on open-source software, we attract people who want to work on open source. They like it, and they like the idea of it. It's a skill they can take with them from one place to another," Shugar added.
For another, Yahoo can benefit from others adapting the software to a broader range of uses, he said.
Gaining influence among developers
There are intangible benefits, as well, when it comes to recognition among programmers, whose influence in some ways makes them the digital elite. Microsoft long ago learned that much of its power comes from developer allies, and Google is trying to put that lesson to good use as well by releasing many open-source projects--Google Chrome being one recent example.
Yahoo isn't in the business of selling technology to others in the manner of Amazon Web Services, Microsoft Azure, or Google App Engine. But having solid technology is essential to Yahoo. While it's willing to sell its search business and engineering skills to Microsoft, it still needs in-house expertise to power its many Web properties and to reduce its operating costs.
It's only a "trickle" of data, but at Yahoo's scale, that can be some pretty heavy work. "When they moved to using the Traffic Server front end, they shaved something like 200 machines off their back end because session management was more efficient," Neerdaels said.
Another part of Yahoo operations retrofitted with the software is Yahoo Mail, he said. Traffic Server can be used to process the cookie text files on a person's browser to figure out whether that person can be logged in automatically or the person needs to authenticate anew. It also can route traffic appropriately when, for example, a person who is "homed" to Yahoo's servers in India visits the site while in the United States.
Traffic Server also manages a lot of more nuts-and-bolts tasks. For example, it can cache Web data closer to browsers so the original Web servers that house the data aren't as overtaxed. And it can store a Web address stored in the Domain Name System to speed up network speeds.
What's it good for?
Some of these chores can be handled by existing software, such as Squid, which is already open source. But Yahoo is on a roll with its open-source work, as the company seeks to advance its internal cloud-computing infrastructure. Expect more to come.
"As various pieces of our cloud get to a point of maturity, we will open-source specific pieces," Shugar said. Future candidates include Yahoo's foundation for hosting its Web applications on a virtualized, more flexible foundation, and its Sherpa and Mobstor services for storing data.
Winning open-source allies can be difficult, and Neerdaels said it takes an engineer a good six months to fully comprehend all Traffic Server's code, so immediate gains beyond fostering goodwill are unlikely.
But in the long run, Yahoo's program could pay significant dividends. Building a series of significant open-source packages could lead to a Yahoo infrastructure that's high-power but more standard than custom-made.
It's not every day that large, significant software packages arrive on the Net in open-source form--much less a series of them that are increasingly relevant to a competitive market of large-scale Web sites.
In this case, Yahoo's gift may indeed become Yahoo's gain.