| 1 |
Documentation
|
| 2 |
- List what files from libtextcat 2.2 go where ?
|
| 3 |
- Say where libtextcat 3.0 can (and cannot ;-) be found
|
| 4 |
- MIME type should be as returned by xdg-utils' 'xdg-mime query filetype ...'
|
| 5 |
- Try listing names of dependency packages for most distros
|
| 6 |
- How to trouble-shoot with delve, get to a file with filters and labels
|
| 7 |
- Explain when indexing and updating are done, eg in the results list, Index on
|
| 8 |
a result already indexed doesn't update it
|
| 9 |
|
| 10 |
General
|
| 11 |
- Fix the FIXMEs
|
| 12 |
- Get rid of dead code/classes/methods...
|
| 13 |
- Advertise service via Rendezvous
|
| 14 |
- Extend metadata beyond title,location,language,type,timestamp,size
|
| 15 |
- Don't package gmo files, they are platform dependent
|
| 16 |
- CLI programs to use tty highlighting if available
|
| 17 |
|
| 18 |
Barpanel
|
| 19 |
- Write a plugin for barpanel http://www.igelle.org/barpanel/
|
| 20 |
|
| 21 |
Tokenize
|
| 22 |
- Allow to cache documents that had to be converted ? eg PDF, MS Word
|
| 23 |
- Write a PDF filter that handles columns correctly, with poppler ?
|
| 24 |
- WordPerfect filter with libwpd
|
| 25 |
- Office filter with libgst
|
| 26 |
- TeX filter
|
| 27 |
- HtmlFilter to look for META tags Author, Creator, Publisher and CreationDate
|
| 28 |
- XmlFilter is slow-ish, rewrite file parsing with the TextReader interface
|
| 29 |
- Filters should at least return errno when they fail
|
| 30 |
|
| 31 |
SQL
|
| 32 |
- Move history files into the index directories
|
| 33 |
|
| 34 |
Monitor
|
| 35 |
- Implement support for Solaris FEM
|
| 36 |
|
| 37 |
Collect
|
| 38 |
- Comply with robot stuff defined at http://www.robotstxt.org/
|
| 39 |
- Harvest mode grabs all pages on a specific site down to a certain depth
|
| 40 |
- Make User-Agent string configurable
|
| 41 |
- Make download timeout configurable
|
| 42 |
- Support for HTML frames
|
| 43 |
- Test NeonDownloader
|
| 44 |
|
| 45 |
Search
|
| 46 |
- With engines that provide a redirection URL for results (eg Acoona), it looks like
|
| 47 |
the query hitory is not saved/checked correctly
|
| 48 |
- Make sure Description files' SyndicationRight is not private or closed
|
| 49 |
- getCloseTerms() should be a search engine method so that WebEngine can use plugins'
|
| 50 |
suggestions Url field (http://developer.mozilla.org/en/docs/Supporting_search_suggestions_in_search_plugins)
|
| 51 |
- Filters with CJKV should work better; supporting quoting would help, eg title:"你好"
|
| 52 |
- Check Mozdex plugin once it's back up
|
| 53 |
- Add a plugin for http://arxiv.org/find
|
| 54 |
|
| 55 |
Index
|
| 56 |
- Play around with the XAPIAN_FLUSH_THRESHOLD env var
|
| 57 |
- MD5 hash to determine on updates whether documents have changed, as done by omindex
|
| 58 |
- Allow to access remote Xapian indexes tunneled through ssh with xapian-progsrv,
|
| 59 |
and make sure ssh will ask passwords with /usr/libexec/openssh/ssh-askpass
|
| 60 |
- Index Nautilus metadata (http://linuxboxadmin.com/articles/nautilus.php)
|
| 61 |
- Reverse terms so that left wildcards can be applied ?
|
| 62 |
- XapianIndex could do with some common code refactoring
|
| 63 |
- Automatically categorize documents based on MIME type and source into picture, video, etc...
|
| 64 |
- After indexing or updating a document, a call to getDocumentInfo() shouldn't be necessary
|
| 65 |
- Labels and the rest of DocumentInfo are handled separately, they shouldn't be
|
| 66 |
- Indexes have no knowledge of indexId's
|
| 67 |
- Be ready to catch DatabaseModifiedError exceptions and reopen the index
|
| 68 |
- Think about security issues, especially when indexes are shared, based on http://plg.uwaterloo.ca/~claclark/fast2005.pdf
|
| 69 |
|
| 70 |
Mail
|
| 71 |
- Find out what kind of locking scheme Mozilla uses (POSIX lock ?) and use that
|
| 72 |
- Index Evolution email (Camel, might be useful for other types actually)
|
| 73 |
- Index mail headers
|
| 74 |
- Decypher and use Mozilla's mailbox scheme, eg
|
| 75 |
mailbox://mbox_file_name?number=2164959&part=1.2&type=text/plain&filename=portability.txt
|
| 76 |
- Keep track of attachments and avoid indexing the same file twice
|
| 77 |
- Mailboxes where all messages are flagged by Mozilla/that are empty are not indexed at all
|
| 78 |
|
| 79 |
Daemon
|
| 80 |
- Allow building without the daemon
|
| 81 |
- Enable to deactivate D-Bus interface
|
| 82 |
- Clean up method names
|
| 83 |
- Prefer ustring to string whenever possible
|
| 84 |
- Queue unindexing too
|
| 85 |
- Follow updates to Xesam specs
|
| 86 |
- Send a signal when crawling is done so that the UI can reopen the index
|
| 87 |
- The daemon should ask for permission before reindexing, especially if the corpus is large
|
| 88 |
- What does a first run mean for the daemon ? ie no configuration file
|
| 89 |
- Daemon should use worker threads' doWork() instead of duplicating code
|
| 90 |
- Only crawl newly added locations when the configuration changes
|
| 91 |
|
| 92 |
UI
|
| 93 |
- Show which threads are running, what they are doing, and allow to stop them
|
| 94 |
selectively
|
| 95 |
- Display search engines icons (Gtk::IconSource::set_filename() and Gtk::Style::render_icon())
|
| 96 |
- Replace glademm with libglademm ?
|
| 97 |
- Use unique (http://www.gnome.org/~ebassi/source/) if available
|
| 98 |
- Either Live Query behaves like a live query (eg results list updated when new
|
| 99 |
documents match) or it is renamed to something else to avoid confusion
|
| 100 |
- When viewing or indexing a result, all rows for that same URL should be updated with
|
| 101 |
the Viewed or Indexed icons (the latter after IndexingThread returns)
|
| 102 |
- Make use of GTKmm 2.10 StatusIcon
|
| 103 |
- Unknown exceptions in IndexingThread or elsewhere should be logged as errors
|
| 104 |
- Delete all temporary files when exiting
|
| 105 |
- Query expansion should be interactive
|
| 106 |
- Default cache provider should be configurable
|
| 107 |
- Offer to index newly mounted volumes
|
| 108 |
- UI doesn't show documents indexed by the daemon the very first time it's run,
|
| 109 |
at least until it's restarted
|
| 110 |
- Status dialog to show time of latest update
|
| 111 |
- Unique preferences
|
| 112 |
- Use gtk2 2.14's gtk_show_uri()
|
| 113 |
- Reload settings after preferences exit
|
| 114 |
- Changing set group by mode a few times will show index results under engine "xapian", why ?
|
| 115 |
- getIndexNames() to return ustring's
|
| 116 |
- Always call getIndexPropertiesByName() with a ustring, store engine names as ustring's
|
| 117 |
|
| 118 |
v0.90
|
| 119 |
- Filters should have a version number so that new versions only reindex documents
|
| 120 |
of the given type
|
| 121 |
- Queries should be cancellable
|
| 122 |
- Queries should return the top N results first, then the rest
|
| 123 |
- D-Bus (Simple)Query shouldn't let the bus connection time out before replying
|
| 124 |
- Live and stored queries shouldn't cap on the number of results but the number of results per page
|
| 125 |
- For each query group in the results list, show Next and Previous buttons to page through results
|
| 126 |
- Browse mode to be merged with the new search and page mode
|
| 127 |
- The query builder for stored queries should be available for live queries too
|
| 128 |
- PinotSettings and threads to be moved outside of UI
|
| 129 |
- CLI programs shouldn't require details about backends, should know indexes by name and
|
| 130 |
know their backends
|
| 131 |
- pinot-search should be able to run stored queries found in the configuration file
|
| 132 |
- pinot-index should be able to index directories recursively, as done by the daemon
|
| 133 |
- Command-line tools to work with relative paths
|
| 134 |
|