6.06.2010

More On Probability

This might drive home how slippery a subject probability is. Warning... anyone not terribly interested in the finer points of how probabilities of events are calculated will risk being rendered unconscious by reading this post. It is not building up to any profound philosophical or political insights, there's no payoff for you if you wade through it t reach the end... it's just arguing about how to properly move numbers around.

On the other hand if you are such a hopeless geek that you consider reading about that to be a payoff in itself, carry on.

There is a raging debate going on in certain quarters about the following question:

"I have two children. One is a boy. What is the probability I have two boys?"
The two answers most people are arguing over are 1 in 2 (50%), and 1 in 3 (33%).

This is an example of a question where overthinking the problem gets you into trouble. I generally hate to use the term "overthinking", in most cases as far as I'm concerned there's no such thing as too much thinking... but there are exceptions to every rule and this is one of them.

First, I'm going to do this the easy way.

One kid is a boy. One kid is of unknown gender. Assuming for the sake of simplicity no biological biases towards a kid being either gender the odds of the unknown kid being a boy are 50%. Therefore the odds of there being two boys is 50%.

That seems pretty obvious, right? Now let me take you on a trip through the wonderful world of people who love to overcomplicate things and trip themselves up.

Now, generally speaking there are two ways to approach a problem like this. Start with what you know about the situation, and construct a matrix of all possible outcomes given that information. (The easy way we just used)

Start with a matrix of all possible outcomes assuming you know NOTHING, then start introducing information and eliminating outcomes that information makes impossible one by one. (a.k.a.: the hard way that is just asking for trouble and gives us answers like 13/27 when people do it wrong)

Now, for an example of this calculation gone wrong, you can see write-ups of it by a couple of it's advocates. One is at the New Scientist... and one is in an article in the NY Times. Neither of these people is anywhere in the neighborhood of being clueless about probability, but that isn't stopping them from making the same error by overcomplicating a simple situation while all the while thinking what they're really doing is revealing a profound counter-intuitive truth that the general public just doesn't understand probability well enough to grasp. The New Scientist write-up is particularly mind-bending since it decides to also introduce the information that the boy we know about was born on a Tuesday then insist that it matters. (It doesn't).

For the purposes of illustrating the concept, I'm just going to deal with the approach in the Times to the simpler problem, walk through it, and show where it makes it's mistake.

Step 1.  Define your search space by listing ALL possible two child combinations by gender.

GG  (girl/girl)
BG (boy/girl)
GB (girl/boy)
BB (boy/boy)

Simple right? Not really. While the listed combinations are 100% correct, the manner in which they have been represented here is setting the stage for the error to come. The designations "B" and "G" are not adequately specific and will cause the people using this method to lose track of what they're really dealing with later.

Step 2. Assign probabilities to all the possible combinations.

GG = 25%
BG = 25%
GB = 25%
BB = 25%

This is completely correct.

3. Introduce the information about the two children we have been given and adjust odds accordingly.

"One is a boy" tells us at least one of the children is absolutely, definitely, 100% certainly a boy.

So people following this approach do what? Well obviously they eliminate the "GG" option because that's now impossible:

GG = 25%

BG = 25%
GB = 25%
BB = 25%

And adjust the odds of the remaining possibilities accordingly:

BG = 33%
GB = 33%
BB = 33%

Ta Da! The odds of having two boys is only 33%!!!!

And, the error has been made. They didn't finish accounting for the information that one of the kids was DEFINITELY a boy before rushing on to declare what the odds were. And the reason they missed it is at least partly because of how they presented the information in step 1. There is more information content in that statement than they are seeing. Here's what they should have done (IF they insist on using this silly unnecessarily complicated method in the first place that is):

Step 1. Define your search space by listing ALL possible two child combinations by gender.

G(1)G(2)
B(1)G(2)
G(1)B(2)
B(1)B(2)

See, here was the problem the first time around. It is not wrong to list BG and GB as two distinct and seperate possible outcomes. But in doing so you are also stating that the order in which a given "B" and "G" are placed MATTERS. So not keeping track of which one you're talking about will cause you problems later.

Step 2. Assign probabilities to all the possible combinations.

i: G(1)G(2) = 25%

ii: B(1)G(2) = 25%
iii: G(1)B(2) = 25%
iv: B(1)B((2) = 25%

Same as before... still completely correct...

3. Introduce the information about the two children we have been given and adjust odds accordingly.

"One is a boy" tells us at least one of the children is absolutely, definitely, 100% certainly a boy.

It doesn't tell us which one. Does that mean we ignore that detail? No. That means we have to take into account BOTH possibilties. EITHER the boy we have been told about is B(1) OR the boy we have been told about is B(2). Both of these situations is possible.

IF he's B(1) then we get this:

i: G(1)G(2) = 25%

ii: B(1)G(2) = 25%
iii: G(1)B(2) = 25%
iv: B(1)B((2) = 25%


...because child (1) can't be a girl if it's a boy (duh). So NOT ONLY is a GG combination impossible, the odds of us having a mixed gender combo have been impacted. On the other hand IF the boy we have been told about is B(2) we get this:

i: G(1)G(2) = 25%

ii: B(1)G(2) = 25%
iii: G(1)B(2) = 25%
iv: B(1)B(2) = 25%

...because child 2 can't be a girl if it's a boy (duh again).

Now... to repeat, we DO NOT KNOW which of these situations we are dealing with. It could be either one. So what is the FULL range of possibilities we now face?

The boy is B(1), child (2) is a girl = B(1) G(2)  = 25%
The boy is (B1), child (2) is a boy= B(1) B(2)  =25%

The boy is B(2), child (1) is a girl= G(1) B(2)  =25%
The boy is B(2), child (1) is a boy= B(1) B(2)  =25%

Now, two of those possibilities have two boys in them. And the totals add to 50%. The correct answer, same as the one we get when we do it the much much simpler way at the beginning of the post.

Or, if you want to think of it another way... before we were told one of the kids was a boy the odds of each of the kids being a girl were 50%. There was a 50% chance the "first" kid was a girl, and there was a 50% chance the "second" kid was a girl.

Once we were told one of the kids was a boy however, the odds of one of the kids being a girl dropped to 0%... because we know one kid is a boy. We didn't know WHICH kid that had happened for, but we do know it had happened. And the people who argue the answer is 33% and not 50% are not accounting for that detail.

I anticipate if any advocates of the second approach wander across this they will proceed to lecture me sternly that I don't know what I'm talking about. I'm guessing by claiming that by applying the level of specificity I did to the labels for each child I claimed to know whether the boy that was discussed at the beginning was th oldest or youngest, even though I specifically stated that we don't know that and based the calculations on that being an unknown factor. (I don't know why they do this, but it's been an ongoing theme.)

The counter argument goes like this. Once they have reached the stage where they eliminate the GG combo as impossible:

BG = 33%

GB = 33%
BB = 33%

They start demanding you tell them which kid is the boy. Is it the first one? You don't know!?!?!? Then that one is still possible! Is it the scond one? You don't know!?!?!? Then that one is also still possible!!!!

And they're right, but still wrong. See, the total range of options here is not "impossible or JUST AS possible as it was before".

To demonstrate why, let's play poker (my own unique variant at least).

We have just started a fresh deck of 52 cards. I have dealt you 5 cards, and myself 5 cards. You have looked at 4 of your cards, and one is face down on the table and you don't know what it is. the 4 cards you can see are a:
2,3,4,6 (different suits)

You are really hoping that card on the table is a 5. But you have to bet before picking it up and looking at it. So you figure out all the possible straights you could end up getting:

2,3,4,5(spades),6   = x%
2,3,4,5(diamonds),6   = x%
2,3,4,5(clubs),6 = x%
2,3,4,5(hearts),6 = x%

(I don't feel like crunching the numbers and it isn't necessary to demonstrate the concept)




You add up all those percentages to get your total odds of holding a straight right now. And you're all ready to make your bet.

Then I tell you I'm holding a 5. (Let's say the rules of the game specify I'm not allowed to lie about my cards, so if I say it it's true)

Uh oh. I didn't tell you which 5. So using the argument of our friends above:

Which 5 is it?
Is it the 5 of spades? You dont know!?!?!? Then 2,3,4,5(spades),6 is still possible! You can't eliminate it!
Is it the 5 of diamonds? You don't know!?!?!? Then the 2,3,4,5(diamonds),6 is still possible! You can't eliminate it!
Is it the 5 of clubs? You dont know!?!?!? Then 2,3,4,5(clubs),6 is still possible! You can't eliminate it!

Is it the 5 of hearts? You don't know!?!?!? Then the 2,3,4,5(hearts),6 is still possible! You can't eliminate it!

All true statements by the way.
And then you conclude that since you can't eliminate any specific combination up there they are ALL still in play simultaneously and you still have the exact same odds of holding a straight. If you actually thought THAT was a true statement I would be asking you over for a cash game real soon. You see, it doesn't matter that we don't know WHICH possibility to eliminate. All that matters is that we know one of them is no longer possible, and adjust the TOTAL odds accordingly.

41 comments:

  1. I admit I haven't read your entire post closely, but I want to explain to you why I think the answer is 33% and maybe you can copy-paste the part of your post where you point out my mistake.

    We agree that the probabilities for having a BB, GB, BG or GG family are equal, 25%. Let's say we have 1000 families of each. So now let's take just the families with AT LEAST one boy. We discard the GG option and we are left with 3000 families. Of the 3000 families, there are 2000 families with mixed gender kids and 1000 all-boy families. Ergo, 33%. What am I missing?

    It's like flipping two coins at once, the probability of the two coins landing on different faces (=mixed gender) is twice as big as the probability of both of them landing on heads (=all-boy).

    (reposting, I don't know if it worked the first time)

    ReplyDelete
  2. Note: The first one did work but I'm slow on the comment moderation sometimes and they don't show up until I see them.

    The approach you're using is the same one I discuss in the post actually. It's incompletely accounting for the information we have been provided with once we are told one of the children is a boy.

    Once you are given that information you have gone from having two unknown variables each with values of 50% boy or 50% girl... to one of them being 100% definitely a boy.

    Now, people always see that that obviously eliminates the GG possibilities. However after that, since they have not been told *which* child is 100% definitely the boy they can't specifically eliminate either the BG or the GB options and simply leave them in play with the probabilities of each of them occuring untouched. Which is wrong.

    ONE of those combinations is now impossible as well. (Either the kid you have been told about is the "first" kid and GB is now impossible... or the kid you have been told about is the "second" kid and BG is now impossible).

    Knowing which one it is is not required to properly account for the effect that has on the available probable outcomes.

    The poker analogy at the end of the post was intended to help illustrate why that is. Once you remove one of those 5s from the deck each individual straight is still possible because you can't eliminate any one of them specifically without knowing the suit of the 5 that was removed. At the same time the COLLECTIVE odds of drawing a straight have been reduced.

    Same thing for the BG/GB combos. Either one is still possible in the sense that you don't know which kid is definitely a boy (and thus definitely NOT a girl) but you know one of the girls has been effectively "removed from the deck" so to speak.

    ReplyDelete
  3. I don't quite follow. I get why the cards example is the way it is. Because you provide information that one 5 is out of the deck. I don't see how this is analogous to the boy/girl case, sorry.

    Let's take coin flips. If I flip two coins at once, the probability of them landing on different sides is 50%. The probability of both landing heads is 25%. So if I want to examine the outcomes that have at least one heads, this condition will be satisified in 75% of the coin flips, of which 2/3 are different-side outcomes. It seems pretty straightforward to me.

    The only way that the coin flip example does not apply to the boy/girl case is if the question that's being asked is not the same.
    If I say "I have flipped two coins. At least one of them landed heads. What's the probability that the other one is also heads?" The answer is clearly 1/3 because the sample space for different-side outcomes is twice as large than the one for heads-heads. I can easily verify this empirically.

    ReplyDelete
  4. Sorry for the delay responding, was having a wisdon tooth pulled yesterday. Ouch.

    We are not in disagreement about the odds involved in what outcomes will be generated when you are randomly flipping two coins. This tends to be the point i can never move past when discussing this with people... this:

    HH: 25%
    HT: 25%
    TH: 25%
    TT: 25%

    ...(or substitute boy/girl instead of heads/tails)is not in dispute. The problem is that with the information we have been given that is not the situation we are dealing with. We are flipping one coin. The other coin is not being flipped, it's being laid down on the table and we know what it is. It's heads (boy)

    Now, what are the odds the OTHER coin is the same when we flip it?

    To argue that it is 33% heads (boy) and 66% tails (girl) just because I told you there's another coin laying on the table nearby that is heads up... but if I take that other coin away then the odds of a coin flip change to 50/50 is to commit a variant of the gambler's fallacy of thinking past outcomes alter future probabilities in random events. "Well, I already flipped heads, the next one will probably be tails now!"

    Or, if you want to just get rid of the coins... "I already have a boy, this biologicaly preconditions future sperms and eggs to be twice as likely to produce a girl" (or vice versa).

    You know that's simply not true. But you manuever yourself into making that argument without realizing it when you approach the probability calculation in the manner you do here. To avoid the confusion the easiest way to do it is to realize what your variable are... and that's the gender of one kid. The other kid you have been told the gender of, that's therfore a fixed constant and has nothing to do with the probability calculation. The only reason I even factored the possible genders of two children in at all in this example is because everyone I discuss thig with insists on doing that way even though it's unecessarily complicated and misleading.

    "I don't quite follow. I get why the cards example is the way it is. Because you provide information that one 5 is out of the deck. I don't see how this is analogous to the boy/girl case, sorry."

    When you say one kid is a boy, one girl is "out of the deck". That kid that is a boy cannot be a girl. Now it is only possible for the other kid to be a girl... or not. By insisting that the search space still contains both the BG and GB combinations in their full initial probabilities that you started with by looking at the possible outcomes when the genders of both children is unknown you are effectively saying that it's still possible for the child you have been told is a boy (even if you don't know which that is) to be a girl anyway.

    ReplyDelete
  5. After reading the Wikipedia entry on this problem (Boy or Girl paradox), it seems we are both right, because the original question is ambiguous.

    The answer can be either 1/2 or 1/3, depending on how you obtain the information that at least one child was a boy.


    If you select a family at random and then take a look at the gender of just *one* child and you only "keep" the family in the sample space if the selected child is a boy, then surely the end result is 1/2 (because if the selected child is a girl, you discard the family from the sample space, even if the other child is a boy).

    The way I understood the question, you take a look at all the families as a whole (two children) and reduce the sample space to just the families with at least one boy (= remove only the GG families). In this case the answer is 1/3.

    In the Wikipedia article, the problem is posed in the form of two different questions and I understood that the implied distinction is between a certain 'sequence' of genders (older child is a boy) vs. 'the sequence doesn't matter' (that is, the younger child can also be a boy).

    If you apply this to coin tosses, the questions are (let's take 20 coins):

    "What is the probabilty of tossing a *certain* sequence of 10 heads and 10 tails vs. tossing 20 heads?" (The answer here is: equal, both are 2^(-20))

    "What is the probability of tossing any 10 heads 10 tails sequence vs. tossing 20 heads?" (In this case, the first probability is much higher of course, since there are 184756 different ways of tossing 10H 10T)

    So I guess we both answered correctly, the problem lies in the ambiguity of the question.

    P.S. I hope the tooth didn't take any wisdom with it :)

    ReplyDelete
  6. Yes, I'm familiar with the "it depends on how you read the question" argument as well, but I don't agree with it. Let's try one more way of walking through this and see if it's more clear... I'm going to have to bring cards back in again.

    People who say the second approach is valid are doing the following (as you're aware, but I need to walk through it again anyway so I can draw a comparison):

    "I have two children..."

    Ok then! Combinations are:

    GG: 25%
    BG: 25%
    GB: 25%
    BB: 25%

    ..."...one is a boy..."

    Ah! Well then we can eliminate GG... but we can't eliminate BG because there's a boy there, and we can't eliminate GB because there's a boy there.. and we can't eliminate BB because there's a boy there... so the FULL search space of those combinations is still available:

    BG: 33%
    GB: 33%
    BB: 33%

    That last sentence is just wrong. The fact that you are unable to specifically eliminate either BG or GB does not mean the odds of their occurance are unaltered by the information provided.

    To demonstrate why, I have to go back to poker hands again.

    If we were to believe the wikipedia argument to be corrent then we would have two valid ways of calculating odds... one if we're told the suit of the card removed (which child is the boy) and one if we are not told the suit of the card removed (if we don't know which child is the boy). That would be the equivalent of this:

    "I have a 2,3,4 and 6 and am trying for a straight...

    Ok then! Combinations are:

    2, 3, 4, 5(spades), 6 : N%
    2, 3, 4, 5(hearts), 6 : N%
    2, 3, 4, 5(clubs), 6: N%
    2, 3, 4, 5(diamonds), 6: N%

    Total odds: 4N%

    "...one 5 is out of the deck..."

    Uh-oh... we cannot eliminate any specific combination above. If we believe the argument contained in the wiki that means the FULL search space of all combinations we could not eliminate still exists. The odds have not been altered.

    You and I both know that is wrong.

    The problem is not disciminating between each combination still being individually possible, and all four combinations still being simultaneously possible. As it stands now all four of those cobinations above are really still possible to acheive since it's possible any of those 5's is NOT the 5 that was removed from the deck. However instead of having one possible search speace consisting of four different simultaneously possible hands... as we had before we were told one 5 was out of play... what we have now is four DIFFERENT possible search spaces each consisting of 3 possible straights.:

    1: The removed card was the spade... leaving:

    2, 3, 4, 5(hearts), 6 : N%
    2, 3, 4, 5(clubs), 6: N%
    2, 3, 4, 5(diamonds), 6: N%


    2: The removed card was the heart... leaving:

    2, 3, 4, 5(spades), 6 : N%
    2, 3, 4, 5(clubs), 6: N%
    2, 3, 4, 5(diamonds), 6: N%

    Etc... for the 5 that was removed being the club or diamond.

    And no matter which of those scenarios we are in the odds of a straight are now 3N%... not 4N% (well, not exactly 3N%... the total number of cars in the deck reduced by one which shifts things a bit but you get the point)

    Now, take that back to boys/girls.

    When we are told one of the kids is a boy, even though it isn't specified which kid it is (or as the wiki article puts it we don't go pick one child), we are NOT left with this situation:

    BG: N%
    GB: N%
    BB: N%

    We are left with one of two possible situations:

    1. The boy we were told about was the "first" child... in which case:

    BG: N%
    GB: N%

    2: The boy we were told about was the second child... in which case:

    GB: N%
    BB: N%

    Now we could be in EITHER situation 1 or situation 2... yes, it is still possible we're in a situation where you could have a GB or in a situation where we could have a BG since... but not at the same time.

    ReplyDelete
  7. Oops... in that final scenario 1 that should of course be:

    1. The boy we were told about was the "first" child... in which case:

    BG: N%
    BB: N%

    ...

    (I appreciate this may seem like I'm just repeating myself, but I think my initial demonstration of this principle was unclear about exactly how the card example was relevent to the child example and being more explicit about the difference between multiple possible combinations in a search space being possible and being *simultaneously* possible was needed.

    ReplyDelete
  8. »The fact that you are unable to specifically eliminate either BG or GB does not mean the odds of their occurance are unaltered by the information provided.«

    Why did you leave out the BB option in the above sentence? »One is a boy« is equally valid for all three options, why would you now focus just on the BG and GB case? If what you're saying is true, then just the fact that we eliminated GG has somehow shifted the balance of probabilities in the remaining three cases. This does not make any sense, since the cases are independent. And you can't just eliminate »one« of them, you have to tell me which one!

    The statement »One is a boy« is *equally* true for all three remaining cases. Now why would you now take just the GB and BG case and start comparing them and eliminate one, because they can't be both true at the same time? There is no justification for this. All three cases are symmetric, I could make the same argument by taking BG and BB, for example.

    "...one 5 is out of the deck..."

    »Uh-oh... we cannot eliminate any specific combination above. If we believe the argument contained in the wiki that means the FULL search space of all combinations we could not eliminate still exists. »

    The full space (meaning all four options) are STILL there, but the probability for each of them has changed because in addition to the probability of getting a specific 5, you have to multiply this with the probability that that specific 5 is still in the deck (75%). This is not analoguous to anything in the boy/girl problem. To put it another way, the reason the probability changes is that you have many non-straight-forming options in the sample space, which stay the same. By removing a 5, you change the balance of probabilities for a straight vs. not a straight. The analogy is not correct.

    »The odds have not been altered«

    The odds of getting a straight have indeed altered. But that's because you removed the 5 and changed the sample space! What I want to know is what *justification* you have for removing a case in the B/G problem. This is not analoguous to the card case. In the card case you just remove the 5 because you decide to. But you can't just decide this in the B/G problem. The analogy does not justify removing one case.

    »The fact that you are unable to specifically eliminate either BG or GB does not mean the odds of their occurance are unaltered by the information provided.«

    Why did you leave out the BB option in the above sentence? »One is a boy« is equally valid for all three options, why would you now focus just on the BG and GB case? If what you're saying is true, then just the fact that we eliminated GG has somehow shifted the balance of probabilities in the remaining three cases. This does not make any sense, since the cases are independent. And you can't just eliminate »one« of them, you have to tell me which one!

    The statement »One is a boy« is *equally* true for all three remaining cases. Now why would you now take just the GB and BG case and start comparing them and eliminate one, because they can't be both true at the same time? There is no justification for this. All three cases are symmetric, I could make the same argument by taking BG and BB, for example.

    ReplyDelete
  9. "...one 5 is out of the deck..."
    »Uh-oh... we cannot eliminate any specific combination above. If we believe the argument contained in the wiki that means the FULL search space of all combinations we could not eliminate still exists. »

    The full space (meaning all four options) are STILL there, but the probability for each of them has changed because in addition to the probability of getting a specific 5, you have to multiply this with the probability that that specific 5 is still in the deck (75%). This last factor is not analogous to anything in the noy/girl problem. To put it another way, the reason the probability changes is that you have many non-straight-forming options in the sample space, which stay the same. By removing a 5, you change the balance of probabilities for a straight vs. not a straight. The analogy is not correct.
    »The odds have not been altered«

    The odds of getting a straight have indeed altered. But that's because you removed the 5 and changed the sample space! What I want to know is what *justification* you have for removing a case in the B/G problem. This is not analoguous to the card case. In the card case you just remove the 5 because you decide to. But you can't just decide this in the B/G problem. The analogy does not justify removing one case.

    When you remove a card, you are *de facto* diminishing the sample space and consequently the odds change, that's clear. For the boy/girl problem, knowing which child is the boy is NOT analogous to knowing the suit of the removed card. Removing a 5 is not analogous to removing one of the BG/GB options. Removing a 5 does change the odds, that's obvious. This is analogous to removing the GG case, which ALSO changes the odds for the remaining 3 cases. They go from 25% to 33%.

    Two outcomes not being possible at the same time is not the same as *removing* one possibility from the sample space! When we pick a random family, we have four possibilities. If the family turns out to be BG, it's not one of the other three. This is a tautology! All you're saying is that if I have this outcome, I don't have the other. Saying that we should effectively remove one case just because they're not both possible at the same time is like saying the odds of me throwing a specific number on a die is 100% for all six numbers because the die can land just on one face at a time and so 5 of them need to be removed from the sample space.

    Look we both agree that in the beginning all four options have 25%. From this it directly follows that if you pick a *familly* at random, the odds of you picking a mixed-gender family are twice as large as you picking an all-boy family. This is trivial! We surely agree on this point. We also agree that the statement »One is a boy« is TRUE for 75% of the families. The Wikipedia article is correct, it all depends on how you arrive at your sample space.

    ReplyDelete
  10. Ok, answer this for me really quick.

    "BG" is a distinct and seperate search space from "GB", despite their both containing the exact same values (one boy and one girl), due simply to the placement of the"B" and "G" in either the first or second position in the sequence, correct? The position of either the "B" or the "G" matters, in a concrete, quantitative way. Agreed?

    If so, try making that more explicit and easier to keep in mind and it may help:

    G(1) G(2) : 25%
    B(1) G(2) : 25%
    G(1) B(2) : 25%
    B(1) B(2) : 25%

    Objections?

    ReplyDelete
  11. No objections there, I agree with the probabilities. The positions itself don't really matter unless you are interested in a specific arrangement. Which we are not. If you just want to know if a family has one boy and one girl, the order does not matter. Same in flipping 20 coins. If you want to know the probability of getting 10 heads and 10 tails, you need to take *all* the options (184756 of them) into account. It's exactly the fact that changing the order yields the same result 184756 times that gets you the *much* higher probabilty of getting 10 heads 10 tails than it is to get all 20 heads.

    This is where we diverge. You are determining the »kind« of a family by looking at the gender of one child. This, of course, determines the sequence of genders (you are in fact discarding half of the BG/GB families because if the gender of the selected child happens to be G, you discard that family from the sample space, even if the other child is a boy). I'm looking at whole families (both children) and keeping the families which satisfy the condition »one is a boy«. This is the cause of our 1/2 vs 1/3 discrepancy.

    Also, a quick example why your reasoning with the cards is not correct:

    Let's say we have 4 aces on the table, face down. If I ask you to choose an ace randomly, what is the probability that you will choose A(spades)? It's 25%.

    Now I remove one of the 4 aces from the table (I don't know which one I removed and neither do you). According to your reasoning I have now "effectively" reduced the sample space by 1 (we don't know which one). This is incorrect.

    The sample space is a set of all possible *outcomes*. True, there are just 3 aces on the table, but there are still 4 possible outcomes when choosing one. That's why it's still possible to calculate the probability of getting the A(spades). It's still 25%. What else could it be? Should we, according to your reasoning, say that »the probabilities are 33% (for the three aces remaining) and 0% (for the removed ace)«? This makes no sense because we don't know which one is removed. But if I tell you which one I removed by looking at it, the odds *do* change and the sample space *is* reduced (one option becomes impossible).

    If I now remove two more aces and you are left with one and I now ask you »Choose an ace (well, there's only one left), what's the probability it's the A(spades)«? It's clearly still 25%, because the 4 possible outcomes remain. How would *you* calculate the odds? Would you say »the probability is 100% for the ace left and 0% for the other three«? That's the same as saying »the odds of throwing a 6 are 100% if you throw a 6, and 0% otherwise.« Go back to the definition of probabilty: number of favorable outcomes divided by all possible outcomes. It all starts here.

    The distinciton you are making between *individual* and *simultaneous* possibilites is meaningless. We are always choosing *one* card so I'm not sure what »simultaneous« could possibly mean in this case.

    If you have any references that show that you can reduce the sample space the way you do, let me know. I have also never encountered the distinction individual/simultaneous when dealing with single independent events. If what you're saying is correct, it means you're proving many mathematicians wrong, because I don't know of any peer-reviewed articles that prove that 1/3 is wrong for every interpretation of the question (these are all arguments from ignorance, I know). Why don't you publish this if you think it's correct?

    It has been an interesting discussion but it seems we are running in circles and it's taking up a lot of my time and energy. I'm looking forward to your future blog posts, I especially like the ones on evolution, which is the original topic that brought me to your blog in the first place.
    Cheers!

    ReplyDelete
  12. Ok then, with no objection we'll move on after I comment on your analysis of the cards situation.

    Although the answer you arrived at was correct (yes, the odds of drawing the spade is in fact still 25% in that situation) you did not apply a tretament to the probabilities involved in the four aces example that was analagous to my own. If we remove one card but do not know what it is then what we are left with is one of four possible scenarios (as we were left with one of two possible scenarios when one of the children was gender-identified without us knowing which child it was)... not just a single scenario with a 3 card search space:

    1: A(spade), A(club), A(diamond) ... Odds of drawing the A(spades) = 1/3

    OR

    2: A(spade), A(heart), A(diamond) ... Odds of drawing the A(spades) = 1/3

    OR

    3: A(spade), A(heart), A(club)... Odds of drawing the A(spades) = 1/3

    OR

    1: A(heart), A(club), A(diamond) ... Odds of drawing the A(spades) = 0/3

    Total odds = 3/12 = 25%. As you see, the correct answer is arrived at.

    Now... moving back to our boys and girls. I am aware that the "1" and "2" designators are important in the sense of differentiating one value from another in the combination but do not have any actual real relevence (they can stand for the one on the left and the one on the right.. for the youngest and the oldest, for whatever... irrelevent... they're just helping us keep track of things.

    That said, we are in agreement here:

    G(1) G(2) : 25%
    B(1) G(2) : 25%
    G(1) B(2) : 25%
    B(1) B(2) : 25%

    Are we also in agreement that the math for how we arrive at those values is as follows:

    Odds of B(1) = 50%
    Odds of G(1) = 50%
    Odds of B(2) = 50%
    Odds of G(2) = 50%

    Odds of G(1) G(2) = 50% x 50% = 25%
    Odds of B(1) G(2) = 50% x 50% = 25%
    etc...?

    That that is the actual manner in which the raw numbers are crunched to arrive at the final values?

    ReplyDelete
  13. »... as we were left with one of the possible scenarios when one of the children was gender-identified without us knowing which child it was ... »

    Here lies the whole point of our disagreement which I'm trying to point out all along. The above statement describes the way *you* are arriving at your sample space: What you do is:

    1.) Take a random family
    2.) Identify the gender of ONE child (doesn't matter which)
    3.) If the identified child is a boy, keep the family in the sample space, otherwise discard it

    From this procedure, it is clear that you will only keep half of the familes (you will discard all GGs and half of the BGs). It clearly follows that the probability is 1/2 and you are correct.
    I'm taking a different approach:

    1.) I take a random family
    2.) I ask the mother »Is at least one of your children a boy?«
    3.) If she says »yes« I keep the family, if she says »no« I discard it.

    In this way I'm left with a set of families for which the mother can truthfully say »One of my children is a boy«. This procedure *does not* contradict the premises of the original problem in any way. The original problem is ambiguous since both approaches are valid. Essentially what we're arguing about is:

    YOU: If you construct the sample space in this way, the probability is ½.
    ME: Yes, but I'm using another approach, which is equally valid and yields a 1/3 probabilty.
    YOU: No, you can't get a 1/3 prob., using my approach it's ½.
    ME: Yes, but I'm not using your approach...
    YOU: But in my approach it's ½...
    Etc, etc ...

    If you take all the two-child families in the world, about half of them will be mixed-gender combos, and 25% all-boy. It is trivially clear that for roughly 75% of the world's two-child families, the mother can truthfully say »I have two children. One is a boy.« My approach is valid.


    The »two scenarios« that you are talking about when making the analogy are indeed valid, but the problem is that the »removal of one card« is analogous to »removal of one of the BG/GB options«. Which is what you have to justify doing in the first place!

    When you say »If you remove one card... » what you're saying is »If you remove one (unknown) BG combo... ». In my case, *I'm not* removing any cards! I don't have to, why should I remove one of the cards? I'm still considering all 4 cards (or both BG combos, if we go back to the original problem). The problem that you have is *justifying* the removal of one BG option. Which you are not constrained to do given the original question.


    If you still want to continue with your derivation at the end, yes, I agree with all your calculations. But I have a feeling that at some point you will again steer into *your* approach of arriving at the sample space and than claim that mine is wrong on that basis.

    ReplyDelete
  14. Correction: You are of course justified in removing one combo *if* you use your approach of identifying the gender of one child. In my approach, there is no need (in fact, it would be wrong).

    ReplyDelete
  15. "Here lies the whole point of our disagreement which I'm trying to point out all along. The above statement describes the way *you* are arriving at your sample space: "

    Not exactly. The statement you quoted was not a description of my method at arriving at the sample space it was a description of the information we were provided with which I then later *used* to arrive at my sample space.

    "1.) Take a random family
    2.) Identify the gender of ONE child (doesn't matter which)
    3.) If the identified child is a boy, keep the family in the sample space, otherwise discard it"


    That is not actually what I am doing here in these last few posts. What I am doing in these posts is trying to prove to you that the search space elimination method you are using is incompletely accounting for the information content of the statement "one of them is a boy".

    "Correction: You are of course justified in removing one combo *if* you use your approach of identifying the gender of one child. In my approach, there is no need (in fact, it would be wrong)."

    Yes, I know... by definition... since removing the combo would render the answer arrived at using your method incorrect. The idea now is to demonstrate why the combo *must* be removed, not just *can* be removed if you look at things a certain way and interpret the language a certain way.

    And as you have agreed with how the probabilities are properly calculated we can now proceed with doing that.

    So, your method argues that we hear the first part of the statement:

    "I have two children"

    ...and the possible gender combinations of the two children, independently, are at this point:

    Odds of B(1) = 50%
    Odds of G(1) = 50%
    Odds of B(2) = 50%
    Odds of G(2) = 50%

    We declare these values because it is a known fact of biology that the odds of any individual child being a boy or a girl are (roughly) 50/50.

    Your approach then argues that when the statement is made that:

    "one of them is a boy"

    The odds change in a certain way. Show me that change in the above format if you wouldn't mind, and justify the values you use. Replace those four 50% values with the ones that lead to this answer:

    G(1) G(2): 0%
    B(1) G(2): 33%
    G(1) B(2): 33%
    B(1) B(2): 33%

    ...which is the answer your method tells us is valid. Show me the raw numbers that support your search space manipulation.

    As you can see, I am not forcing the calculation down my method's path. It is quite straight forward to do that and arrive at my answer, using the method I have already demonstrated with the four aces example, and the values used are each independently completely rational and consistent... but by all means plug in the numbers generated by your method and let's see what we have and if it makes any sense.

    ReplyDelete
  16. P(GG)+P(BG)+P(GB)+P(BB)=1

    Here we have 4 independent events, all equally probable (P=1/4). Introducing the statement "one is a boy" makes the probability of GG go to zero. That's all it does. The statement "one is a boy" is not true for GG, therefore we eliminate the option.

    0+P(BG)+P(GB)+P(BB)=1

    The remaining 3 possibilites are still equally probable, why on earth would the removal of an independent possibility make the remaining ones unequal?

    P(BG)=P(GB)=P(BB)=1/3

    It's like if I throw a dice:

    P(1)+P(2)+P(3)+P(4)+P(5)+P(6)=1 (ergo, 1/6 each)

    I now introduce a statement that makes one of the above possibilities impossible (like above):

    "I have not thrown a 5."

    P(1)+P(2)+P(3)+P(4)+ 0 +P(6)=1

    Do you agree that the probabilites for the remaining 5 possible outcomes are equal (1/5)? You surely must! Why on earth would eliminating the "5" option have any effect on the balance of probabilites of the remaining outcomes?? The examples are exactly the same. If you disagree, please do the calculation for the dice, I'm really interested in the result (which is not 1/5, apparently).

    One further analogous example would be saying:

    "I have thrown an even number."

    Which makes P(1)=P(3)=P(5)=0 and P(2)=P(4)=P(6)=1/3.

    You don't arrive at these numbers in the above examples by using the original probability (1/6) in some equation. You arrive at these numbers the same way you arrive at the 1/6 for a dice throw: you count the number of possibilities, take them to be equal (because you have no information that would justify one being larger than the other) and take into account that the sum must equal 1. I don't understand why you're being so stubborn about this basic stuff.



    The distinction between us arises because I'm using the *whole* family combo as an "event", and you're using "determining the sex of the remaining child" as the event. That's why our result differ, but neither of them are wrong. We're not calculating the same thing, essentially.

    ReplyDelete
  17. All you have done is state the final answer your method arrives at... not what I was asking.

    IF your answer is correct. IF your method of manipulating the search spaces is valid. IF your statement "Introducing the statement "one is a boy" makes the probability of GG go to zero. That's all it does. " is true...

    ...then fill in these blanks:

    Odds of B(1) = __%
    Odds of G(1) = __%
    Odds of B(2) = __%
    Odds of G(2) = __%

    Those values should be determinable. And those values should be independently justifiable.

    ReplyDelete
  18. Are you asking about those odds after I eliminate the GG option?

    I'm left with BG, GB and BB.

    B(1)=2/3
    G(1)=1/3
    B(2)=2/3
    G(2)=1/3

    If I choose a random family from the above 3 (equal) possibilites, these are the odds for the genders of individual children. If this is what you mean. The sum of course is not 1, because these are not mutually exclusive events that make up the search space.

    Again, I don't see why you're focusing on single children, because the events that I'm dealing with are family combinations (GG, BG, GB, BB).

    You have not commented on the dice examples, can you please show me where my reasoning is wrong?

    Look at it this way. The statement "at least one of my children is a boy" is equivalent to the statement "I don't have two girls". Do you agree? If not, please show in what way does one contain more information than the other. Please construct an example for which one of the statements is true and the other not (for a two child family). You will surely fail.

    Using the equivalent statement "I don't have two girls" it is *impossible* to distinguish between the remaining three cases (BG, GB, BB) in any way whatsoever. Therefore, the probabilities *must* remain equal.

    ReplyDelete
  19. Are you asking about those odds after I eliminate the GG option?

    I'm left with BG, GB and BB.

    B(1)=2/3
    G(1)=1/3
    B(2)=2/3
    G(2)=1/3


    Is that so?

    G(1) x G(2) = 1/3 x 1/3 = 1/9 = 11.1%

    That's how the combo odds are calculated. Your numbers don't match.

    Again, I don't see why you're focusing on single children, because the events that I'm dealing with are family combinations (GG, BG, GB, BB).

    Because the combinations are composed of individuals, and the probabilities of each need to be consistent with each other.

    (I am not in disagreement with your dice example. You stated the dice throw was not a 5, the odds of it being a 5 dropped to 0%. No problem, and not inconsistent with anything I have been arguing.)

    ReplyDelete
  20. Sorry, I neglected to respond to this part...

    Look at it this way. The statement "at least one of my children is a boy" is equivalent to the statement "I don't have two girls". Do you agree?

    If accompanied by the information that he has two children then one means the other, yes. The problem of course is not that one contains more information than the other, it is that BOTH contain more information than you are accounting for.

    I'll demonstrate by performing the same calculation for my method that I've requested you perform for yours. For either the statement "one of them is a boy" or "I do not have two girls" we have to modify these original values:

    Odds of B(1) = 1/2
    Odds of G(1) = 1/2
    Odds of B(2) = 1/2
    Odds of G(2) = 1/2

    ...to properly account for that data.

    Now whether we're doing that by focusing on the form of the first statement (the odds of finding at least one boy must be 100% in all scenarios)... or doing it by focusing on the form of the second statement (the odds of finding two girls together must be 0% in all scenarios) you're still going to end up with the same answer.

    Having received information about the gender of one child... but not knowing WHICH child... you must account for the possibility of it being either one. So we have two possible situations we may be in (as opposed to the example with the aces, where we had four possible situations we might be in):

    1: Child "#1" is the one that is definitely a boy (or if you prefer... definitely NOT a girl), in which case:

    Odds of B(1) = 1
    Odds of G(1) = 0
    Odds of B(2) = 1/2
    Odds of G(2) = 1/2

    OR

    2: Child "#2" is the one that is definitely a boy (or if you prefer... definitely NOT a girl), in which case:

    Odds of B(1) = 1/2
    Odds of G(1) = 1/2
    Odds of B(2) = 1
    Odds of G(2) = 0

    Now what that leaves us with is the combined possible outcomes of both scenario 1 AND 2:

    1:

    G(1) x G(2) = 0
    B(1) x G(2) = 1/2
    G(1) x B(2) = 0
    B(1) x B(2) = 1/2

    2:

    G(1) x G(2) = 0
    B(1) x G(2) = 0
    G(1) x B(2) = 1/2
    B(1) x B(2) = 1/2

    Total:

    G(1) G(2) = 0 (Of course)
    B(1) G(2) = 1/4
    G(1) B(2) = 1/4
    B(1) B(2) = 2/4

    Odds of two boys? 50/50.

    See?

    Now, how do you assign probabilities to the genders of the individual children, that accurately reflect the information we have been given, and lead to your answer?

    ReplyDelete
  21. »G(1) x G(2) = 1/3 x 1/3 = 1/9 = 11.1%«

    Now why would you calculate for GG when I EXPLICITLY said that those are the odds AFTER I eliminate the GG option?

    The odds that I gave you were for the said case: If you choose a random family from the set (BG, GB, BB), the 1st children are (B, G, B) and the 2nd children are (G, B, B). Hence the odds.

    You are still insisting on taking »the gender of the individual child« as the event, when I have repeatedly stated that the events considered are »picking a family«. You are constantly dodging this issue and refusing to acknowledge it.

    Again, the B(1)=B(2)=G(1)=G(2)=1/2 are the correct odds for the FULL set of families (GG, BG, GB, BB) ONLY! The same as 1/6 is correct for the full set for a dice throw. When eliminating one option(»5«), we get the probabilty 1/5. When you show me how you get from 1/6 to 1/5 by using 1/6 in some calculation, apply that to the B/G problem to get from 1/2 to 1/3. It can't be done using 1/2! You clearly see that for the dice, yet you demand from me that I do it for the B/G.

    I will spell out the analogy for the last time (the coin analogy is slightly different in that it doesn't consider the whole remaining sample space):

    1. Full sample space
    a) (GG, BG, GB, BB) (see how the elements are »families« and not children?)
    b) (1,2,3,4,5,6)
    c) All the combinations of a 20 coin toss

    2. Introduce information that reduces sample space
    a) »I don't have two girls« -> remove GG
    b) »I have not thrown 5« -> remove 5
    c) »of the 20 coins, at least 10 landed heads« -> remove those that don't have at least 10H

    3. Ask the question about the probabilty of an event in the reduced sample space
    a) »what is the probabilty that the family is BB?«
    b) »what is the probability that I have thrown a 6?«
    c) »what is the probability that the remaining 10 coins all landed H vs. that the remaining 10 all landed T?«

    4. Calculate the probabilty from the sample space size
    a) Sample size is 3, so if I choose one randomly, the prob. of it being BB is 1/3
    b) Sample size is 5, so if I choose one randomly, the prob. of it being 6 is 1/5
    c) Sample size is 184756 for 10H10T and 1 for 20H, so the prob. if throwing 10H10T is 184756-times higher.

    You seem to agree with the dice and the coin examples, yet you don't with B/G, when they are CLEARLY completely analogous.


    We agree that if we say »the first child is a boy« or »the second child is a boy« the probability is ½. But you claim that if we now say »ONE of the children is a boy (either the first or the second)«, the probability stays the same!

    That's like saying »P(1)=P(3)=P(5)=1/6«, therefore »P(odd)=1/6«. What? For a specific case, the probability is the same than for the general case which includes all the specific cases?

    Finally, if your calculations don't work in real life, they are wrong. For about 75% of 2-child families, the mother can truthfully say »I don't have two girls.« If I randomly choose one from this set of mothers, I have a 1/3 chance that the mother will have 2 boys. Reality proves you wrong. You seem to insist that we remove half of BG/GB options from the sample space. Which ones do we remove? Do we remove Mrs. Jones? Or Mrs. Smith? On what grounds? The statement »I don't have two girls« is true for all of them. You can't say »we remove half of them, but I can't specify which«, thats exactly the point when calculating probabiltes: if you can't rule it out, it has a non-zero probability, hence you HAVE to leave it in. This is so trivial that I can not believe how you are not able to grasp it. What if in a real world situation you get a set with 998 GG, 1001 GB, 1000 BG and 1001 BB families? What are the probabilites here? How will you throw half of the BG/GBs out? Will you throw half a family out? Will the sample size for the GB/BG be 1000 and a half?

    This is my last post on the matter, I hope you read it thoroughly. If you don't see your error by now, there's nothing more that I can say. Good luck.

    ReplyDelete
  22. Tomaz, would you think it reasonable if I presented you with a simple equation:

    X x Y = Z

    Then told you that Z was equal to zero, then told you that X and Y were 1 and 4?

    And then I got upset when you said Z was 4 because I just told you that wasn't the answer?

    You have just done the equivalent. It is not my fault you said that GG was eliminated (GG = 0%) then provided simultaneous probabilities of both children being girls that did not result in a 0% value for GG... which is what is required for GG to be eliminated (set to 0%) as a possible outcome.

    As for the individual children vs. families issue, it isn't an issue. The families are composed of individuals. I am not selecting one over the other since they are the exact same thing. The odds of any combination of children occuring is the equivalent of the cumulative odds of the individuals occuring. That is a basic mathematical reality, you cannot change it by declaring it is not so.


    As for this:

    thats exactly the point when calculating probabiltes: if you can't rule it out, it has a non-zero probability, hence you HAVE to leave it in. This is so trivial that I can not believe how you are not able to grasp it.

    Let me repeat the answer I just posted in my last comment:

    "G(1) G(2) = 0 (Of course)
    B(1) G(2) = 1/4
    G(1) B(2) = 1/4
    B(1) B(2) = 2/4"

    Did I remove BG? No I did not, it's still there with a 1/4 probability of occurance.

    Did I remove GB? No I did not. It's still there with a 1/4 probability of occurance.

    Now what was that were you saying?

    ReplyDelete
  23. Let's say I have 4000 mothers in my sample: 998 GG, 1000 GB, 1001 BG, 1001 BB. One of the mothers comes forward and says "I don't have two girls." Please calculate (using the above real life sample that is slightly off the ideal distribution) the probability that the mother has two boys. Is the result about 1/3 or about 1/2? The example doesn't need the assumptions of equally probable genders at birth. You already have a real life sample.

    P=1001/(1000+1001+1001)=0,3344

    Please, let me see the calculation for the above example which gives you a result close to 1/2.

    Give me a number.

    Please.

    I'm done here.

    ReplyDelete
  24. Ok, so just not going to deal with the way the combo odds are calculated at all and ignore everything I just pointed out to you then? The math is irrelevent?

    All you are doing in your latest example is introducing a small sampling error that you would see in any non-theoretical real world sample. None of that changes the calculation I already performed for you once in any meaningful way. Why ask me to do it again? But if you insist:

    Since the mother has told us she does not have two girls at least one of the children absolutely must be a boy... and the simultaneous occurances of girls must be a cumulative 0%

    Which means we are still, as always, in one of two possible situations, with the odds involved being ever-so-slightly skewed because of your non-perfect sample space.

    1:

    G(1)= 0%
    B(1) = 100%
    G(2) = ~50.025%
    B(2) = ~50.025%

    OR

    2:

    G(1)= ~49.975%
    B(1) = ~50.025%
    G(2) = 0%
    B(2) = 100%

    Now those odds aren't exactly right, since the numbers you chose actually introduced cross dependence between children (for example, G1 is marginally less likely if paired with G2 than if found with B2... and expressing that exactly gets way more complicated than we need to get here) but it's close enough for our purposes.

    Which leads us to:

    1:
    G(1) G(2) = 0 x 0.50025 = 0
    B(1) G(2) = 1 x 0.50025 = 0.50025
    G(1) B(2) = 0 x 0.50025 = 0
    B(1) B(2) = 1 x 0.50025 = 0.50025

    2:
    G(1) G(2) = 0.49975 x 0 = 0
    B(1) G(2) = 0.50025 x 0 = 0
    G(1) B(2) = 0.49975 x 1 = 0.49975
    B(1) B(2) = 0.50025 x 1 = 0.50025

    Totals:

    G(1) G(2) = 0%
    B(1) G(2) = 25.0125%
    G(1) B(2) = 24.975%
    B(1) B(2) = 50.025%

    There, done. If we were being *perfectly* accurate and not rounding things off to deal with the dependency you introduced then the G(1) B(2) value would be just slightly higher, but that's the outcome to within a decimal place or two of accuracy.

    ReplyDelete
  25. You have totally missed the point of my example. I did not give you the probability of individual children being a boy or a girl, you just ASSUMED it was 1/2. Now do the calculation again WITHOUT assuming a probability of being born a certain gender.

    What if I the actual real world probability was 1/3 for a child being born a girl and 2/3 for it being a boy? It is totally IRRELEVANT to the SPECIFIC SAMPLE SET that I gave you in the problem. Are you saying that with the 1/3-2/3 general population odds for gender, the solution for the SPECIFIC SET would be different???

    If I tell you that I have 5 red and 5 blue balls in a sack and you pull one ball out, what is the probability that the ball is blue? You KNOW that it's 50% and for you to calculate this it is totally IRELLEVANT what the ratio of blue/red ball production is in the factory where I got them from.

    You calculate the probability by dividing favorable outcomes with total outcomes IN THE SPECIFIC SET. The same as I did: favorable outcomes for BB are 1001 and total possible outcomes are 1001+1001+1000.

    You just need the numbers of the SPECIFIC case that I gave you. The "birth color" probability for the balls is IRRELEVANT.

    Oh and for the upteenth time, REALITY disagrees with you. You can do the experiment yourself. Write GG on 25 pieces of paper, GB on 25, BG on 25 and BB on 25. Put all the pieces of paper in a bag. Start pulling papers out and if it has at least one boy, see whether it's BB or one of GB/BG. If it's BB, add 1 to column A. If it is not BB (that is, it's BG or GB), add 1 to column B. Return the paper to the bag.

    After repeating the experiment many times, you will find that column B is about twice as large as column A. ERGO, 1/3 for BB!

    I don't care about your math, I don't have to care about your math - if your math disagrees with actual real life experiments, it's clearly wrong.

    ReplyDelete
  26. If I had ASSUMED it was 1/2 I would probably have used 1/2, don't you think?

    I used the values you provided. I see no reason to repeat the calculation and arrive at the exact same answer.

    And no, "reality" does not prove me wrong. Assuming your method of constructing and utilizing the search spaces is the correct manner of answering the question posed and calling it reality proves me wrong, but that would happen to be the point in dispute so assuming it's correct would be uncalled for.

    ReplyDelete
  27. I constructed the same sample space as you did in the original post. GG=BG=GB=BB=1/4. These are the four events of the sample space. It all follows from here.

    Please publish your world-of-mathematics-shattering results and prove the rest of the world wrong. We can continue when you do.

    Regards, Tomaž

    ReplyDelete
  28. You constructed, as I constructed, an initial "all possible two child combination" sample space based on us having ZERO information about the genders of the children involved.

    Where our disagreement lies is not in what the appropriate sample space of "all possible two child combinations" is. It is the appropriate sample space of children described by the statement, "I have two children, one is a boy".

    You think the appropriate sample space is to begin from the "all possible" sample space, cross out GG, and make absolutely no alterations whatsoever to the weighting of the rest of the field based on the information provided. I do not.

    And I have shown why I do not.

    And as you are already aware, there is significant dispute over how to properly answer the question already, stating that I believe one of the methods is correct and one is inappropriate is hardly "prove the rest of the world wrong, world-of-mathematics-shattering" stuff. Stop being hysterical.

    ReplyDelete
  29. You are assigning different (weighted) probabilities to 3 balls in a bag. How do you pull one ball out of the bag with a 1/4 chance, and another with 1/2? Seriously, show me an experiment that proves this.


    There are 1001 red balls (BB) in a bag of 3002 balls (BB+GB+BG) and the chance of me pulling a red ball is about 50%? And for one of the 1001 blue balls (BG), it's only about 25%? This is a truly remarkable result worthy of publication.

    ReplyDelete
  30. There are "1001 red balls" in the bag if we accept your approach to constructing the appropriate search space for the information we have been provided.

    All you are doing, over and over, is telling me that if we do it your way we get your answer.

    I know that already. I have spent the last many posts explaining to you why doing it your way is innapropriate.

    ReplyDelete
  31. Perhaps this will help, assuming you're still checking in. After reading through your posts again I need to be sure we have a common understanding of basic principles here.

    Given the following search space (excuse my crude text-table here):

    --------X-------Y------Z
    ---X--..XX.....XY.....XZ
    ---Y--..YX.....YY.....YZ
    ---Z--..ZX.....ZY.....ZZ

    ...what are the odds of a YZ outcome? Humor me.

    ReplyDelete
  32. If we take X,Y,Z to be colored balls (red, blue, green) it depends on how the problem is presented.

    If you have 3 balls in bag 1 and 3 balls in bag 2 and you're asking me what's the probability that I'll get let's say a green1-red2 combo, then it's 1/9.

    But if the bags are indistinguishable from each other, the probability of getting a green-red combination when pulling one ball from each bag is 2/9.

    I apologize for my somewhat frustrated tone in some of my posts. I was, well, frustrated.

    ReplyDelete
  33. Don't worry about it, this particular subject couldn't be better designed to produce frustration.

    I'm afraid your answer, which is the one I expected, is incorrect however. The correct answer is "indeterminable". I didn't tell you what the individual probabilities of occurrence of X Y and Z were and you cannot tell that just by looking at a table of possible outcome combinations.

    This graph:

    --------X-------Y------Z
    ---X--..XX.....XY.....XZ
    ---Y--..YX.....YY.....YZ
    ---Z--..ZX.....ZY.....ZZ

    Tells us only how many possible *types* of outcome we have, NOT how likely any one of them is. You assumed equal likelihood based on how the table looked.

    If I had accompanied that table with the statement that X Y and Z all had 1/3 odds of occurrence then your answers would be correct. If however I had said X is 10% likely, Y is 50% likely and Z is 40% likely the table would look exactly the same but the answer would obviously be different. (Odds of YZ in that order would be 20%... odds of drawing them together unordered would be 40%

    My point being that drawing sample space tables of possible outcome combinations is insufficient to determine their odds of occurrence. You MUST know the odds of occurrence of the individual variables that compose the combinations. At all times.

    Ok?

    ReplyDelete
  34. Sure, you are correct, in general the probabilities are indeterminable. However, for the specific case that I described, the probabilities are correct. When drawing balls from a bag, the probabilites must be equal for all balls in the bag, do you agree?

    The key here is that I consider the boy/girl problem to be analogous to the balls in a bag case. You seem to disagree and in my mind you have not yet justified why the analogy does not hold.

    If we start with BB=BG=GB=GG=1/4 (with which you agree) and assign colored balls (blue, red, yellow, green) to each case we have a 1/4 chance of pulling a ball of a particular color (if we start with 1/4 balls of each color). Even before removing any balls (families), the probability of drawing a boy/girl combo is 1/2 (twice the probability of BB), because boy/girl is represented by balls of two colors (red, yellow) and BB by one (blue).

    So now if I pull a random ball out of the bag (1/4 prob. for each color) and I tell you that I have not pulled a green one (GG), what are the odds that I drew a blue one (BB)?

    You know that I either pulled red, yellow or blue. They are the only ones possible. Furthermore, you know that there is an equal amount of red, yellow and blue balls in the bag. So how can you come to any other conclusion than that the prob. is 1/3? How can you have unequal probabilites for balls in a bag?

    I assure you that if we do the colored balls experiment and every time I pull a ball out, I tell you a color that I didn't pull, you will guess the color I pulled 1/3 of the times on average.

    If you consider pulling a ball (or a family, a combo) as an event, this is the only conclusion possible. There is, of course, another way of looking at the problem if you say that the gender of one child is fixed (boy) and we are speculating on the gender of the other child. Here, the gender of the other child is the "event". You can not mix the two problems, they are different. Both legitimate, given the ambigous question.

    You seem to be jumping from "family" event to "gender" event in the middle of your reasoning to justify why 1/3 is wrong.

    ReplyDelete
  35. Sure the probabilities are equal for all balls in the bag. But you can't assume an equal number of X Y and Z balls present in the bag in the first place. The number of each type of ball in the bag is a function of it's individual probability of occurrence.

    The key here is that I consider the boy/girl problem to be analogous to the balls in a bag case. You seem to disagree and in my mind you have not yet justified why the analogy does not hold.

    You actually just made a comment after this that illustrates that very thing perfectly:

    "So now if I pull a random ball out of the bag (1/4 prob. for each color) and I tell you that I have not pulled a green one (GG), what are the odds that I drew a blue one (BB)?"

    GG is not analogous to a ball. Nor is BB. BB represents the combination of two independent events (two different children both being born male). A "G" is a ball. A "B" is a ball. A BB is two balls, not one.

    This is the same problem as we saw with the "YZ" example. You saw that combination occupying one slot in the search space and treated it as a single discreet event which caused you to make unwarranted assumptions about its probability of occurring. Probabilities of multiple independent events are not determined by how many combinations are possible, they are determined by the probability of the individual events that comprise each combination.

    ReplyDelete
  36. What do you mean I can't assume an equal number of balls in the bag, of course I can. I can assume that based on the BB=BG=GB=GG=1/4 equality. You agree on this 1/4 chance. From this equality I can definitely assume an equal number of balls (or family combos).

    You seem to be stuck on this "combination of two independent events". Yes, we can regard a combo like that, too. But once we get to BB=BG=GB=GG=1/4, how we got there is irrelevant. This is our starting point, that each combo is 1/4 of the general population. This is the only thing that matters from this point on.

    "Probabilities of multiple independent events are not determined by how many combinations are possible, they are determined by the probability of the individual events that comprise each combination."

    Yes, that's how we get from B=G=1/2 to BB=BG=GB=GG=1/4.

    But the point is that your 'individual events' here are B and G and they are equally probable for a given birth (1/2), so in your table the number of occupied slots is therefore *actually* proportional to the probability of that combined event happening (if you regard X and Y to be B and G). Again, that's how we get to BB=BG=GB=GG=1/4, on which we both agree.

    Anyway, there's nothing wrong with assigning a ball to a combo. Think of it as assigning a ball to a mother:

    Based on BB=BG=GB=GG=1/4 (which is based on B=G=1/2) we get a list of 100 mothers, 25 of each (a sample of the general population). I don't know which mother has which combo of children. I select a mother from the list randomly and I ask her: "Is one of your children a boy?" If she says yes you then bet on her being a mother of two boys and I bet that she also has one girl. You will win the bet 1/3 of the time on average.

    This can easily be empirically verified, why don't you try it?

    ReplyDelete
  37. What do you mean I can't assume an equal number of balls in the bag, of course I can. I can assume that based on the BB=BG=GB=GG=1/4 equality. You agree on this 1/4 chance.

    I agree on that one quarter chance for the statement "I have two children". I do not, nor have I ever, agreed on that for the statement "I have two children, one is a boy".

    But the point is that your 'individual events' here are B and G and they are equally probable for a given birth (1/2), "

    They are equally probable for any given randomly selected child in the general population. However we have a modifying statement. One of our two children must be a boy.

    "Based on BB=BG=GB=GG=1/4 (which is based on B=G=1/2) we get a list of 100 mothers, "


    Yes, I know. But BB=BG=GB=BB=1/4 is the search space for the statement "I have two children. NOT the search space for the statement "I have two children. One is a boy." We're not looking at the former 100 mothers, we're looking at the latter.

    ReplyDelete
  38. Grant, let me try some other ways to explain to Tomaz (assuming he may still bother to check here).

    Tomaz,

    The thoughts that could help explain why 1/2 is the correct answer are:

    1. "Among families with two children with at least one boy, what is the chance that I am the parent of two boys?" is a different question from, "I have two children. One is a boy. What is the chance that the other is also a boy?"

    2. There is one possible way a household with one boy and one girl could be in the universe to be considered. There are two possible ways in which a household with two boys could be in the universe to be considered.

    3. Whether we visualize the situation as elder child and younger child or left-right, the question being asked is about THE OTHER CHILD. One child has already been exposed, or thought of, or mentioned. This exposed child could be the only one in the boy+girl and girl+boy cases but could be one of the two boys in the boy+boy case. Thus the denominator becomes 4 and not 3 and the numerator is 2 giving the probability of two boys as 2/4 = 1/2.

    4. Your description of taking 100 two-child families and after eliminating girl+girl families, one third of the remaining families having boy+boy is correct. However that is not the appropriate scenario for the probability question being asked because you are going back to considering two-child families as a whole. The child whose gender was revealed cannot "participate" in determining the gender of the other child. Though not very sure, I think the confusion arises due to applying joint probability instead of conditional probability. Subject to knowing that one of the children is a boy, what are the chances that the other is also a boy?

    METHOD TO VISUALIZE THE SITUATION

    Imagine the possible four combinations as four small shacks completely hidden behind a wall. The four shacks contain (G, G), (G, B), (B, G) and (B, B). At this point if one asks what is the probability of a shack containing two boys, the answer is 1/4. Now, one child from one of the shacks walks out through a narrow doorway in the wall. It happens to be a boy. The question asked is, what is the probability that the other child in that shack is also a boy? There are four possible boys who could have walked out:

    The boy from the (G, B) shack
    The boy from the (B, G) shack
    The boy near the door of the (B, B) shack
    The boy away from the door of the (B, B) shack

    In the last two cases, the other child is a boy, in the first two cases, the other child is a girl.

    Yes, there are only three shacks and one third of the shacks contain a lone boy but that is not the question. The question is the probability of the lone child being a boy and there are four possible lone children cases to consider.

    Hope this is useful. In any case, I salute the persistent thirst of knowledge shown by both of you :-)

    ReplyDelete
  39. RG,
    I agree that your interpretation of the problem is valid. The problem is that given the original statement

    "Mr. Smith has two children. At least one of them is a boy. What is the probability that both children are boys?"

    we don't know how we obtained information about Mr. Smith's family. From the Wikipedia article:

    "In this case the critical assumption is how Mr. Smith's family was selected and how the statement was formed. One possibility is that families with two girls were excluded in which case the answer is 1/3. The other possibility is that the family was selected randomly and then a true statement was made about the family and if there had been two girls in the Smith family, the statement would have been made that "at least one is a girl". If the Smith family were selected as in the latter case, the answer to [above] question is 1/2."

    You just assumed that the statement was made after ONE child was considered. I argue that the above statements can also be made when we randomly select 2-child families as a whole. This is not inconsistent with the original statement. We don't know how the statement was made so we are free to assume how this information was obtained.

    Both ways are valid and both answers are valid. The question is ambiguous, which is nicely explained in the article.

    I conceded this fact in my original post. I was NOT arguing with Grant that 1/3 is the ONLY possible answer, I was arguing that it is a VALID answer given a certain interpretation of the problem.

    Grant was not arguing with me whether my interpretation is valid or not, he was arguing that even GIVEN MY INTERPRETATION, the answer is still 1/2. Which is wrong.

    If you want to show that 1/3 can not be *a* correct answer, you need to show that we CAN NOT arrive at the statements

    "Mr. Smith has two children. At least one of them is a boy."

    by considering whole families (i.e. taking a set of families for which the parents can truthfully say "at least one of my children is a boy"). Which you can't. Both 1/2 and 1/3 are possible, given just these statements while not having information on how we obtained them.

    ReplyDelete
  40. (Sorry about the delay getting that last comment up, I wandered away to other places on the interwebs for a while and wasn't paying attention.)

    I'm not going to re-hash my reply... as I already responded to the claim that you can pick and choose how the family was selected to get different answers and shown why it's incorrect back on July 22.

    ReplyDelete
  41. Wow. Is it really necessary to write such a lengthy post? There are two possibilities for the other child. 50% chance. The end?

    ReplyDelete