PDF searcher finds technical data

Oct. 26, 2006
A search engine specifically designed to sift through information in PDF catalog pages targets technical users looking for product information.

The DirectIndustry portal search engine lists only manufacturers. A search by keyword returns a list of product images in order of keyword relevance. Company officials say this eases searching because the site prioritizes images of products and logos over pure text.


Developers at DirectIndustry.comsay they have devised proprietary algorithms for converting ordinary PDF pages into versions that can be searched online. They use these techniques to field a search site specifically devoted to finding information in industrial catalogs.

According to company officials, three independent programs go into manipulating and pulling out data from catalog PDFs. The first step is to divide a PDF of a catalog into PDFs of its individual pages. Each catalog page gets converted into a jpeg file so it can be viewed in an ordinary Web browser without starting up a PDF viewer. Simultaneously, another program extracts and organizes data from the original catalog PDF. The program tags each word in this database with the original page number from the catalog on which it was found. This tagging lets the search engine display a jpeg with the information of interest.

Finally, the original singlepage PDFs are stored in parallel with the corresponding jpeg image of them. When a user wants to get a closer look at an area on the page, the site reverts back to the PDF for a zoom-in. The same process takes place when users select text or save the page locally.

DirectIndustry says it takes about 2 hr or less to render a typical PDF catalog into its searchable format. The service, called the Virtual Technical Library, now contains over 90,000 pages of technical information from over 3,400 PDFs, the company says. These come from 5,600 companies with about 26% of them hailing from the U.S.

Sponsored Recommendations

Sept. 16, 2025
From robotic arms to high-speed conveyors, accuracy matters. Discover how encoders transform motor control by turning motion into real-time datadelivering tighter speed control...
Sept. 16, 2025
Keep high-torque gearboxes running efficiently with external lubrication and cooling systems delivered fast. Flexible configurations, sensor-ready monitoring, and stocked options...
Sept. 16, 2025
Now assembled in the U.S., compact P2.e planetary gear units combine maximum torque, thermal efficiency, and flexible configurations for heavy-duty applicationsavailable faster...
Aug. 22, 2025
Discover how to meet growing customer demands for custom products without overextending your engineering team. Learn how scaling your automation strategy can help you win more...

Voice your opinion!

To join the conversation, and become an exclusive member of Machine Design, create an account today!