 |
Article:
 |
 |
Introduction to Nutch, Part 1: Crawling
|
| Subject: |
Crawl |
| Date: |
2006-01-12 05:54:09 |
| From: |
athome |
|
Response to: Crawl
|

|
added 127.0.0.1 "hostname" to the hosts file.
After running the same command I get :
060112 085356 Added 0 pages
060112 085356 FetchListTool started
060112 085356 Overall processing: Sorted 0 entries in 0.0 seconds.
060112 085356 Overall processing: Sorted NaN entries/second
060112 085356 FetchListTool completed
060112 085357 logging at INFO
060112 085358 Updating /usr/local/nutch/nutch-0.7.1/test/db
060112 085358 Updating for /usr/local/nutch/nutch-0.7.1/test/segments/20060112085356
060112 085358 Finishing update
060112 085358 Update finished
060112 085358 FetchListTool started
060112 085358 Overall processing: Sorted 0 entries in 0.0 seconds.
060112 085358 Overall processing: Sorted NaN entries/second
060112 085358 FetchListTool completed
060112 085358 logging at INFO
060112 085359 Updating /usr/local/nutch/nutch-0.7.1/test/db
060112 085359 Updating for /usr/local/nutch/nutch-0.7.1/test/segments/20060112085358
060112 085359 Finishing update
060112 085400 Update finished
060112 085400 FetchListTool started
060112 085400 Overall processing: Sorted 0 entries in 0.0 seconds.
060112 085400 Overall processing: Sorted NaN entries/second
060112 085400 FetchListTool completed
060112 085400 logging at INFO
060112 085401 Updating /usr/local/nutch/nutch-0.7.1/test/db
060112 085401 Updating for /usr/local/nutch/nutch-0.7.1/test/segments/20060112085400
060112 085401 Finishing update
060112 085401 Update finished
060112 085401 FetchListTool started
060112 085401 Overall processing: Sorted 0 entries in 0.0 seconds.
060112 085401 Overall processing: Sorted NaN entries/second
060112 085401 FetchListTool completed
060112 085401 logging at INFO
060112 085403 Updating /usr/local/nutch/nutch-0.7.1/test/db
060112 085403 Updating for /usr/local/nutch/nutch-0.7.1/test/segments/20060112085401
060112 085403 Finishing update
060112 085403 Update finished
060112 085403 FetchListTool started
060112 085403 Overall processing: Sorted 0 entries in 0.0 seconds.
060112 085403 Overall processing: Sorted NaN entries/second
060112 085403 FetchListTool completed
060112 085403 logging at INFO
060112 085404 Updating /usr/local/nutch/nutch-0.7.1/test/db
060112 085404 Updating for /usr/local/nutch/nutch-0.7.1/test/segments/20060112085403
060112 085404 Finishing update
060112 085404 Update finished
060112 085404 Updating /usr/local/nutch/nutch-0.7.1/test/segments from /usr/local/nutch/nutch-0.7.1/test/db
060112 085405 reading /usr/local/nutch/nutch-0.7.1/test/segments/20060112085356
060112 085405 reading /usr/local/nutch/nutch-0.7.1/test/segments/20060112085358
060112 085405 reading /usr/local/nutch/nutch-0.7.1/test/segments/20060112085400
060112 085405 reading /usr/local/nutch/nutch-0.7.1/test/segments/20060112085401
060112 085405 reading /usr/local/nutch/nutch-0.7.1/test/segments/20060112085403
060112 085405 Sorting pages by url...
060112 085405 Getting updated scores and anchors from db...
060112 085405 Sorting updates by segment...
060112 085405 Updating segments...
060112 085405 Done updating /usr/local/nutch/nutch-0.7.1/test/segments from /usr/local/nutch/nutch-0.7.1/test/db
060112 085405 indexing segment: /usr/local/nutch/nutch-0.7.1/test/segments/20060112085356
060112 085405 * Opening segment 20060112085356
060112 085405 * Indexing segment 20060112085356
060112 085405 * Optimizing index...
060112 085405 * Moving index to NFS if needed...
060112 085405 DONE indexing segment 20060112085356: total 0 records in 0.054 s (NaN rec/s).
060112 085405 done indexing
060112 085405 indexing segment: /usr/local/nutch/nutch-0.7.1/test/segments/20060112085358
060112 085405 * Opening segment 20060112085358
060112 085405 * Indexing segment 20060112085358
060112 085405 * Optimizing index...
060112 085405 * Moving index to NFS if needed...
060112 085405 DONE indexing segment 20060112085358: total 0 records in 0.034 s (NaN rec/s).
060112 085405 done indexing
060112 085405 indexing segment: /usr/local/nutch/nutch-0.7.1/test/segments/20060112085400
060112 085405 * Opening segment 20060112085400
060112 085405 * Indexing segment 20060112085400
060112 085405 * Optimizing index...
060112 085405 * Moving index to NFS if needed...
060112 085405 DONE indexing segment 20060112085400: total 0 records in 0.032 s (NaN rec/s).
060112 085405 done indexing
060112 085405 indexing segment: /usr/local/nutch/nutch-0.7.1/test/segments/20060112085401
060112 085405 * Opening segment 20060112085401
060112 085405 * Indexing segment 20060112085401
060112 085405 * Optimizing index...
060112 085405 * Moving index to NFS if needed...
060112 085405 DONE indexing segment 20060112085401: total 0 records in 0.129 s (NaN rec/s).
060112 085405 done indexing
060112 085405 indexing segment: /usr/local/nutch/nutch-0.7.1/test/segments/20060112085403
060112 085405 * Opening segment 20060112085403
060112 085405 * Indexing segment 20060112085403
060112 085405 * Optimizing index...
060112 085405 * Moving index to NFS if needed...
060112 085405 DONE indexing segment 20060112085403: total 0 records in 0.03 s (NaN rec/s).
060112 085405 done indexing
060112 085405 Reading url hashes...
060112 085405 Sorting url hashes...
060112 085405 Deleting url duplicates...
060112 085405 Deleted 0 url duplicates.
060112 085405 Reading content hashes...
060112 085405 Sorting content hashes...
060112 085405 Deleting content duplicates...
060112 085405 Deleted 0 content duplicates.
060112 085405 Duplicate deletion complete locally. Now returning to NFS...
060112 085405 DeleteDuplicates complete
060112 085405 Merging segment indexes...
060112 085405 crawl finished: test
Zero pages ?
"Edit the file conf/crawl-urlfilter.txt and replace MY.DOMAIN.NAME with the name of the domain you wish to crawl. For example, if you wished to limit the crawl to the apache.org domain, the line should read:
+^http://([a-z0-9]*\.)*apache.org/
This will include any url in the domain apache.org."
So, why zero pages? |
|