|
|
|
|
|
|
|
|
CTK Exchange
Mutley2003
Member since Apr-8-06
|
Apr-08-06, 10:10 PM (EST) |
 |
"Rank order from the min number of paired comparisons"
|
Hi I think this problem has something in common with tournaments, but I am not sure quite where to start. Also with binary search, perhaps.Suppose I have m items, and I wish to ascertain their complete rank order 1..m using only paired comparisons and the minimum, on average, number of paired comparisons at that. I can do the comparisons sequentially ie knowing the results of all prior tests. I am looking for an algorithm and some understanding of its optimality or otherwise. As an example suppose I have seven items and the test sequence goes like this (a,b comparison >> winner) 1,2 >> 2 3,4 >> 3 5,6 >> 6 7 has not been tested yet. If I introduce the concept of "possibly informative comparisons" now, we have remaining 1,3 1,4 1,6 1,7 2,3 2,4 2,5 2,6 2,7 etc etc But testing 2,3 (winner against winner) means that if 3 is the winner then we never need test 1,3 although we will have to test 2,4 So, there should be some gains (I think) in testing winner against winner (and then loser of that against previous winner etc). I am sure this is well trodden ground, and I would appreciate any pointers to set me on the right path. I would also be interested in extensions to the problem eg if we are interested in establishing only the top k ranks, and don't care about the rest. I am also interested in the algorithmics, how I would write it in code. One idea I have is to start with all the items on a number line with upper and lower possible bounds : a test would then serve to shrink the bounds and/or make them non overlapping. Sorry for the woolly thinking here, I am just feeling my way. Any help would be greatly appreciated.
|
|
Alert | IP |
Printer-friendly page |
Reply |
Reply With Quote | Top |
|
|
mr_homm
Member since May-22-05
|
Apr-09-06, 04:10 PM (EST) |
 |
1. "RE: Rank order from the min number of paired comparisons"
In response to message #0
|
Hi Mutley2003, >Suppose I have m items, and I wish to ascertain their >complete rank order 1..m using only paired comparisons and >the minimum, on average, number of paired comparisons at >that. I can do the comparisons sequentially ie knowing the >results of all prior tests. > >I am looking for an algorithm and some understanding of its >optimality or otherwise. This is basically a sorting problem. There are various known algorithms for this, from bubble-sort (very simple to understand but very slow) to quick-sort (currently, as far as I know, tied with several other algorithms for fastest execution). Quick-sort is a very easy to understand algorithm. Basically, you start with your m items in their initial order, and assign them initial position numbers 1 through m (the easy way to do this is to simply store the values of the items in an array, so the array index is their number). You then pick an item at random (called the pivot) and compare everything else to it to produce two subarrays, one with everything smaller and the other with everything larger than the given element. You now have a situation where every element in the first array is known to be smaller than every element of the second array, so it is never necessary to compare them with each other again. Therfore the problem breaks down into tow smaller problems, namely sorting each of the subarrays. Now quick-sort calls ITSELF on each of the sub-arrays. Eventually all the sub-arrays have size one and the algorithm exits. At this point, every element must pretty obviously be in its correct place. It is a very recursive algorithm, as you can see. This structure is very similar to the algorithmic structure of the Fast Fourier Transform algorithm, and accounts for its very good execution time. It is called a "divide and conquer" algorithm. Quick-sort performs m/2 swaps (on average) and then calls two copies of itself, which each perform m/4 swaps, for a total of another m/2, and then 4 copies of itself are called, each performing m/8 swaps, again for a total of another m/2, and so on. Each level of subroutine makes m/2 swaps, and there are at most log_2(m) levels of calling. Hence the total number of swaps averages to (m/2)log_2(m), which is very good indeed. As far as I am aware, it is not known whether a faster algorithm can exist (someone correct me if I am wrong), but this is the currently fastest known algorithm that works for general m. For specific cases, such as the m=7 example you give, there can of course be faster sorting tricks, but they will only work for that specific value of m. Here is a link to somebody else's Description of quick-sort. His discussion is good, but he is wrong about it being hard to program. The first time I heard about it, I sat down and make a working program in about 15 minutes. He mentions that sometimes the behavior of quick-sort is not very good, because if the data is already nearly ordered and you pick the first item in the list as the pivot, it runs in n^2 time, the same as bad old bubble-sort. There is a way to avoid this by thinking about how numbers are represented in your data set. Suppose you are using unsigned integer data. In that case, you can avoid the pivot idea altogether: just sort the list on the highest bit into two sublists with initial bit 0 and initial bit 1, then sort each of the sublists on the second bit, and so on. What if the numbers are signed integers? Then if they are stored in sign_magnitude format (not common!) sort on the first bit (0 means positive and 1 means negative in this format) and then sort the positive sublist just like the unsigned integers, and sort the negative list in reverse order. If they are stored in 2's complement (which is almost universally used) the bit complementing process fixes up the order of the negative numbers for you automatically, and so you can sort these exactly like unsigned integers. If they are floating point, the exponent is usually stored in sign_magnitude and the rest of the digits in 2's complement. The first digit is the mantissa sign, and the second digit is the exponent sign. In this case, sort on the first bit, then on the second bit, then on the exponent bits (sort the sublist with negative exponent in reverse order) then on the mantissa bits (in the normal order). This implementation depends on knowing the data format of your numbers, but it is never slower than quick-sort and has much better worst-case behavior, because you can't run into the badly chosen pivot problem. It has the advantage that you can produce the sublists by simply swapping the elements of the array in place, without creating new arrays to hold the subarrays. You make pointers FIRST and LAST to the first and last array elements and call sort(FIRST,LAST). Inside sort() you first check if FIRST = LAST and exit if so. Then save them as FIRST_0 and LAST_0, then decrement FIRST and increment LAST until FIRST points to an element with the current sort bit = 1 and LAST points to an element with bit = 0. Swap those two elements and continue moving the pointers until FIRST = LAST-1. Now FIRST points to the last element of the subarray with bit 0 and LAST points to the first element of the subarray with bit 1. At this point, sort() calls sort(FIRST_0,FIRST) to sort the subarray with bit 0, and calls sort(LAST,LAST_0) to sort the subarray with bit 1. This still uses a lot of stack for the recursive calls, but it's all pointers into the same data array, and so it never steps outside the originally allocated space. As you might be able to tell from the level of detail I went into here, this implementation of quick-sort is my own innovation, and I can't resist talking just a little too much about it. Just ignore it if it's beyond the scope of your question. Hope this helps! --Stuart Anderson |
|
Alert | IP |
Printer-friendly page |
Reply |
Reply With Quote | Top |
|
|
 |
Mutley2003

guest
|
Apr-10-06, 07:51 PM (EST) |
|
2. "RE: Rank order from the min number of paired comparisons"
In response to message #1
|
Thanks Stuart for the quicksort ideas and algorithm .. I will program your implementation some time. Sorting won't work because I don't know the values of the items (sorry if I did not make this clear). I can only ascertain which of a pair is better after doing a test. Like a tennis tournament. afaik, a tournament is usually "programmed" as a) initial random pairings b) winner of each pair plays winner of any other pair selected at random c) process terminates when only 1 pair remains, and the winner of that pair is determined So, with M players I can determine the winner in M-1 matches. This only gives me the winner (rank 1). But not the complete rank order, although there is some partial information. Suppose we have 6 players A B C D E F First round A,B -> B (meaning A plays B and B wins) C,D -> C E,F -> F Second round B,C -> B F stands aside Final round B,F -> B So, what we know B>C>D B>A B>F>E We dont know F?C A?C (we don't also need A?D) A?F (we don't need A?E) Possible rank orders, consistent with what we know BFECDA BFCEDA etc etc etc We know some things A could not be #2, nor could D or E (the losers on the first round) #2 can only be C or F If we now test C,F (the loser of the penultimate round with the indeterminate one) Say C,F ->F We now know the rank order must be BFsomething maybe BFCsomething or BFAsomething (third place cannot be D or E (because they lost) or B or F) I guess we test A,C A,C ->C We now have BFCsomething Possible candidates for #4 A,D,E test A,D ->D A,E ->E D,E ->D implying the order is DEA which looks like we have finished ie BFCDEA (the underlying, unknown, values used for this were A=2 B=9 C=6 D=4 E=3 F=8) 10 pairwise tests. About half the max of 6x5/2 possible pairwise tests. What is not clear to me is how we might code this "logic" into an algorithm. |
|
Alert | IP |
Printer-friendly page |
Reply |
Reply With Quote | Top |
|
|
|
 |
mr_homm
Member since May-22-05
|
Apr-10-06, 10:06 PM (EST) |
 |
3. "RE: Rank order from the min number of paired comparisons"
In response to message #2
|
Hi John, >Thanks Stuart for the quicksort ideas and algorithm .. I >will program your implementation some time. > >Sorting won't work because I don't know the values of the >items (sorry if I did not make this clear). I can only >ascertain which of a pair is better after doing a test. Like >a tennis tournament. Quick-sort can handle that kind of situation. If your information is not numerical, you can't take advantage of the bit comparison tricks I mentioned, but the comparison of two numbers works just like the playing of a tennis match. There is one assumption underlying all this: namely, that there IS an actual definite order that does not depend on how the comparisons are done. In real life, it is certainly possible (due the the particular strengths and weaknesses of individual players) that the outcome of a tennis tournament could be different if the matches were played against different initial opponents. But if there is an objective order and your comparisons are discovering it (rather than creating it) then the order of the comparisons cannot matter. Here is a version of quick-sort that obviously does work for your situation. I'll state it in the language of a tournament among N players: Rule 1: Pick a player at random, and let him play against every one of the other players. Then if M-1 players will beat him, he will beat N-M players , so his rank in the final order is completely determined. He is the Mth best player, and he need never play again. There is also now no need for anyone he beat to play against anyone who beat him, so the tournament has now broken down into two smaller sub-tournaments of size M-1 and N-M. Rule 2: Find the largest remaining sub-tournament and apply Rule 1 to it. Rule 3: When the largest remaining sub-tournament has size=1, the whole tournament is finished. At this point, the rank of each player has been determined exactly. Just as with the numerical quick-sort, this algorithm will be expected to complete in Nlog(N) time, with the worst case as N^2 time. Rule 2 is not really necessary, as you could just choose ANY sub-tournament instead of the largest one, but I like keeping the sizes of the sub-tournaments roughly equal for some kind of instinctive aesthetic reason. It doesn't make the algorithm any more efficient, just more definite. > >afaik, a tournament is usually "programmed" as >a) initial random pairings >b) winner of each pair plays winner of any other pair >selected at random >c) process terminates when only 1 pair remains, and the >winner of that pair is determined > >So, with M players I can determine the winner in M-1 >matches. > >This only gives me the winner (rank 1). But not the complete >rank order, although there is some partial information. > I think this choice of "program" has more to do with human ideas of what "seems fair" to the players than algorithmic efficiency. Quick sort would make a horrible sporting event, with one guy playing until he was worn out, and then going home, and half the remaining people playing some other guy and KNOWING already that they can't win the tournament. Not the sort of action to keep sports fans on the edges of their seats! The kind of logic you were outlining would be very hard to program, since it involves an analysis at each stage of what matches would be productive to test. The logic is necessarily somewhat ad hoc, because the situation keeps changing at each stage. The nice thing about quick-sort is that it neatly segregates the productive matches from the unproductive ones at each step, removing the need for any analysis. --Stuart Anderson
|
|
Alert | IP |
Printer-friendly page |
Reply |
Reply With Quote | Top |
|
|
|
 |
Mutley2003

guest
|
Apr-11-06, 07:10 PM (EST) |
|
4. "RE: Rank order from the min number of paired comparisons"
In response to message #3
|
Thankyou, Stuart, for your very clear exposition. The quicksort/partitioning method may indeed be the best we can do, and I won't press you to think about it further. As you say, this would make a horrible sporting event (because of the one against all the others sequence, and because the worst case is O(N**2) ) , and similar considerations apply in my context - I am asking people to state their preference for a given pair (say of cities) with the objective of deducing their underlying full rank order (of course I could always ask them for that directly, but there are theoretical reasons why not). At the back of my mind was the possibility that it would not matter if a) we did not determine the high ranks (the losers) very accurately b) some uncertainty would be acceptable about the exact ranks. I realized I did not know anything about the seeding algorithm used in tennis, so I googled and found "Finding the largest and second largest" https://www.seeingwithc.org/topic3html.html which appears to have something in common with my proposed "algorithm" - the first tournament is pairwise at random, the second tournament is amongst all those who have lost to the winner. It looks somewhat promising, so I will study it further. also https://www.oxfordcroquet.com/tech/appleton/index.asp Thanks for your input - it has helped me get a clearer idea of where to go regards
|
|
Alert | IP |
Printer-friendly page |
Reply |
Reply With Quote | Top |
|
|
|

You may be curious to have a look at the old CTK Exchange archive. Please do not post there.
|Front page|
|Contents|
Copyright © 1996-2018 Alexander Bogomolny
|
|