This project is about creation of a World Wide Web search engine based on concepts of distributing workload in a similar fashion achieved by successful projects such as SETI@home and distributed.net.
The project was founded in late 2004 by Alex Chudnovsky (email), whose detailed CV in Microsoft Word format can be viewed here. The development work and hardware is funded by Majestic-12 Ltd.
What is it really about?
If you read this then you must be online and will probably agree that the Internet has become so important to our everyday lives that we can hardly think its living without it. During the Internet’s very beginning, it has always been important to be able to find information quickly and accurately. Had it not been for the invention of the search engine, then the Internet would not have become what it had become today. In layman’s terms, search engines are like signs and maps, they point us in the right direction of the information that we are looking for. Imagine trying to travel on the M6 motorway (a major road here in the United Kingdom) with no signs or road maps… No one would know when or where to get off to get to where they needed to be. If you can’t imagine your life without using search engine, then read on.
So what about search engines?
There are millions web sites out there, with billions of pages and so far only a handful of huge companies were able to create a search engine that can provide relevant information to the users. Big companies control the entry point to the data you seek, and neither you nor web masters who run the sites have a say in the matter.
How does Majestic-12 fit into all this?
Majestic-12 is developing a search engine scalable to billions of web pages that is based on support by the community. Since the task of building a World Wide Web search engine is so huge, we have chosen to make Majestic-12 Distributed Search Engine based on the concept of distributed computing. The idea being that many machines work on one task to get it done quicker than one large machine alone. One of the biggest challenges with the search engins is actually getting billions of pages, and to do this cost effectively we have created a client software called MJ12node that can be run on otherwise idle computers. This concept was used successfully by projects like SETI@HOME and distributed.net.
MJ12node software combines machines from all around the globe to crawl, collate and then send back it’s findings to the master server. The crawled data will be analysed (indexed) and added to the Majestic-12 search engine. The result? Hopefully the biggest crawl of the web, and perhaps even the most up to date search engine of it’s time.
Why run a Node?
By running a Majestic-12 Node you will achieve a number of worthy goals:
- help the Internet community to create the search engine they control
- use your otherwise idle computer and broadband connection – you paid for it, why not use it?
- help science to understand the Web better
- be part of the growing community that will help shape the Internet the way we want!
- prove that one man counts: your contribution to the effort will be visible
So if you’re interested in the starting the Distributed Search Engine Revolution… then follow me!
This inspirational speech is based on original Evil-Dragon’s introduction to MJ12node’s manual.
P.S. On the photo above you can see Alex Chudnovsky, the founder of the UK’s division of Majestic-12, during training session with majestic squirrels (the greys) at undisclosed location.
Majestic-12: DSearch: Technology
MJ12bot, the principal distributed component of the Majestic-12 search engine project is the subject of continuing investment by Majestic-12 Ltd. The results of this crawl are fed into a specialised search engine with daily updates. A full explanation follows.
The first prototype full text search engine was built in 2006, but has not operated for some time.
The prototype full text engine contained an index of around 1 billion pages. After assessing the prototype, Majestic-12 came to the conclusion that in order to operate a full text engine in any meaningful capacity, significant steps were needed to enhance the relevance of search, performance, and speed of update. In addition, it was recognised that to run a search engine of any scale, large investments would have to be made in hardware and infrastructure.
As a result of this, research projects were initiated to improve the quality of crawl, and to build key search engine components such as a link map. In order to facilitate these projects, some of the results of research projects were commercialised, the MajesticSEO product launching in 2008.
Majestic-12 now operates a greatly enhanced crawl, with updates on its web scale backlinks index on a daily basis. This backlinks index is open for queries using a dedicated, high performance search at MajesticSEO.com. Majestic-12 continues to offer webmasters the ability to download data for their own sites for free via MajesticSEO, and continues to invest in the improvement of its crawler and search infrastructure.
Majestic-12 continues to make strides towards developing an understanding of the architecture required for effective full text search, and continues to develop the components required for a quality web scale full text search.