REBOL.net

Re: Algorithm challenge: selecting a range from multiple blocks

Sunanda (dhsunanda)
23-Sep-2007/6:47:44-4:00
#43059
<Back   Thread   Next>
<Back   Index   Next>

Thanks for the responses so far.

I haven't had time to do any detailed timing tests on larger 
datasets, but what I have checked has worked well.

Thanks to all!

***

Tom:
> is it ok for the results to have a mix of new and existing objects
Yes -- the block is ephemeral, so get-subset is just one stage of 
winnowing it down to a final data structure.

> is 'data only appended to
Yes -- to keep the objects in the same order. There may be ways 
other than append to achieve that.

> can 'data objects with empty items: [] be safely deleted from 'data?
Yes.

> what is the ratio between updating and querying 'data
> what are typical ranges?
> how often do ranges fall within one items block?
> how big is length? data

You are really asking what is the live application. Good question....

....It's REBOL.org's search for Altme world archives.


If you look here while not logged on:
   http://www.rebol.org/cgi-bin/cgiwrap/rebol/aga-index.r
you'll see only one world archive right now. But we may add others 
(eg the original REBOL world, then its successor: REBOL2).


If you are logged on, then you will see multiple world archives: 
the RUA/user.r world is visible if you are logged on. Some other 
world archives exist too (mainly for testing) You'll only see 
those if your REBOL.org member name is on the list for those world 
archives.


The CGI search (not yet live) works by searching *all* world 
archives visible to you, and then windowing the results -- so you 
may see 100 results to a webpage. Those results may be partially 
from (say) the R3WP archive and partially from the RUA archive.


What's a typical search? It's hard to say. We want to work well 
and fast for edge cases.....

....Like a search for the word "the" or "a". Those cases will 
produce objects with many tens thousands of entries. If the user 
has their paging window set to (say) 50 results, typically 
get-subset will return just one object with 50 entries.

.....A search for a rare word ("bucket" is in my test data set) 
produces relatively few hits, so get-subset typically ends up 
returning all the objects with all the items -- ie the use will 
see only one page of results.

Though the code to add the pagination and emit HTML is not in 
place, you can see a sneak preview of the code to date here:

http://www.rebol.org/cgi-bin/cgiwrap/rebol/aga-search.r?q=bucket

Try while logged on, and vary the word being searched, and you'll 
get a feel for the sort of data get-subset will be working on.

To formally map to the algorithm challenge:
* there is one object per visible world archive
* the raw-hits block within each object contains the zero or more 
integers; each maps to an Altme posting that contains the searched 
word(s).
* get-subset has not (yet!) been applied to the data you see on 
the webpage

***

More challenge entries welcome!

Sunanda


<Back   Thread   Next>
<Back   Index   Next>

REBOL.com