Blog post

Friday, July 7, 2017

Visualising Ancestry DNA matches-Part 3-Navigation and Presentation

This is the third post in a series about Visualising Ancestry DNA Matches. In previous posts we got ready, and loaded the files. In this post I’ll show you how to get around your graph, and provide some options to adjust the appearance of the chart so that it can be more easily understood.

To begin, open up the file you saved at the end of the last post. You won’t see the graph you created.

Don’t Panic.

Click Show Graph and your work will reappear with your last settings intact.

image

Navigation

Jump to a person

I suggested previously that you should move back to the Vertices worksheet and try clicking on some dots. If you haven’t done so, try it now. You’ll find that when you click on a dot the appropriate line on the Vertices worksheet is highlighted. If you scroll right on that worksheet you will see the person’s name, kit administrator and shared cM. You will also find that the ‘matchURL’ field contains a clickable hyperlink to your match page with that person. Very handy!

It works the other way as well. If you select a line or lines on the worksheet, the corresponding dot (or dots) will highlight in red. Note: You may have to click a few times to see this. Only a small proportion of all your matches are on the graph as it only displays people who have shared match information.

Hide excess columns

Scrolling to the right every time can be a bit annoying. We can quickly hide some of those excess columns. On the NodeXL Ribbon, click the Workbook Columns button. Here you hide and show the ‘Visual Properties’ and ‘Labels’ columns if you wish.

image

Take a closer look

The controls that will help you get around the chart itself are at the top of the graph display area.

image

  • The Arrow button allow you to make selections on the chart.
  • The + and – Magnifying glasses will zoom the image in or out, as will the Zoom slider
  • When you are zoomed in, the Hand button will let you move to different parts of the graph. If you can’t select dots, it’s probably because you’ve left the hand button active.
  • The Scale slider leaves the graph the same size, but will make everything on it (dots, line width, labels) smaller.
  • Notice most buttons have usage tips that will appear when you hover over them.

Have a play with the controls.

Presentation

NodeXL allows for a lot of customisation. We’re going to give our graphs a makeover! We’re going to try on different layouts, emphasise our closer cousins and accessorise with carefully chosen labels. By the time we’re finished those frumpy scribbles will be elegant figures wearing designer labels.

We’re aiming for before and after shots something like this:

imageimage

Layouts

So far we’ve stuck with the default layout algorithm. There are other layouts to choose from. When I first tried NodeXL I was suffering from a bad case of DNA circle envy, so I choose circle layouts. They worked well with small groups of matches. Since then I’ve acquired more matches and have settled on the ‘Harel-Koren Fast Multiscale’ option (used in the ‘after’ image above).

Layout options are available on the graph area toolbar and on the NodeXL Ribbon.

image

  • Select a layout option from the drop down list.
  • Each time you select a different layout option NodeXL forgets that you want to keep your groups in separate boxes. Remind it by opening up Layout options… (same menu, bottom item). It seems to retain the options you last set, so just click OK.
  • To apply the new layout, click Lay Out Again.

Go ahead and try different layouts out until you find one that works well with your data.

Dot size and labels

I’ve adjusted the dot sizes on my charts to correspond with the sharedCM value – bigger dots are closer cousins. I’ve also applied labels so I can see who is who without moving back to the vertices worksheet. When I hover over a dot, a tooltip appears with whatever note I had entered on the person’s Ancestry match page at the time I downloaded the file.

All this can be done very easily using options found under just one button.

Click the Autofill Columns button on the NodeXL Ribbon.

image

The dialog below will appear. This dialog will write values in the ‘Visual Properties’ columns and ‘Labels’ column based on the columns you choose.

  • Set Vertex Label to ‘name’
  • Set Vertex Tooltip to ‘note’
  • Set Vertex Size to ‘sharedCM’ – then click on the Options button on the right.

image

The Vertex Size options let you decide how big or small the dot representing each person should be based on numerical values in the column you select.

The settings shown below worked for well me. You may be quite happy to leave the smallest number as “The smallest number in the column”. I increased the number to 10 so that I could tell the difference between my closer cousins and everyone else more easily. The number 30 worked well for me as the upper limit (anyone with shared CM of 30 or more will be drawn at the maximum size). Experiment and see what works for you.

To get out and apply the settings:

  • Click OK on the ‘Vertex Size Options’ box
  • Click Autofill on the ‘Autofill Columns’ box.
    The information will be written into the appropriate columns and the settings applied immediately.
  • Click Close on the ‘Autofill Columns’ box

image

Scale the features

By now you should have graphs that look something like this:

image

It’s a bit cluttered and hard to see what’s going on. Use the Scale slider to adjust the dots and labels to suit the Zoom level you are using.

image

Here’s a closer look at the same group with a Zoom of 200 and a Scale of 40.

image

Don’t forget to save!

What can we do with this?

These graphs show DNA matching relationships in the Ancestry DNA data.

  • Each dot represents a person on your DNA match list.
  • The bigger the dot, the more shared DNA they have with you.
  • Each line represents a relationship between two people who are estimated to be fourth cousins or closer to each other (at least one of the two people must be estimated fourth cousin or closer to you).

When we look at it this way, we can see linkages that are not visible on the Ancestry DNA shared match pages. I can think of dozens of scenarios where this sort of information could lead to valuable clues.

For example:

  • On ‘Cousin K’s’ shared match page, I can see ‘Cousin O’ and ‘Cousin I’.
  • I don’t see ‘Cousin S’ or ‘Cousin T’ who are distantly related to me, but more closely related to Cousin K.
  • ‘Cousin S’ is a (estimated) distant relative to me, but must be a fourth cousin or closer to both ‘Cousin K’ and ‘Cousin I’ for the connecting lines to show.

Suppose the key to my connection with fourth Cousins ‘K’ and ‘I’ happens to lie with Cousin ‘S’? If Cousin ‘S’ doesn’t have a public tree linked to their DNA kit no amount of searching for names or places will find them. As I have thousands of DNA matches on Ancestry, I’m unlikely to make my way all the way to their page which will be well back in my results – let alone contact them if I have nothing else to go on.

Whether you’re taking a paper trail or a segment matching approach to your DNA matches, it helps to know which of your thousands of matches might be relevant to a particular problem.

Now that I’ve visualised the relationships this way, I know that Cousin S exists and that it could be worthwhile contacting them.

14 comments:

  1. fascinating thank you - I've been doing something quite similar manually, but this procedure should show some connections I haven't been able to find. - love that the dots indicate cM shared.

    ReplyDelete
    Replies
    1. Thanks Nancy, there are so many options!

      Delete
  2. Shelley,

    I want to let you know that three of your blog posts are listed in today's Genealogy Fab Finds post at http://janasgenealogyandfamilyhistory.blogspot.com/2017/07/janas-genealogy-fab-finds-for-july-7.html

    Have a great weekend!

    ReplyDelete
  3. I've analysed 2 kits using this tool and all going well! Thank you for bringing it to people's attention.

    ReplyDelete
  4. Great work Shelly!
    I just wish ancestry gave us more information so we could see how close the cousins are related so the thickness of the lines would indicate their relatedness (cM) I am thinking along the lines of Tufte's graphical display of quantitative information....GEDmatch could produce the data.....would NodeXL basic be able to add this extra dimension if provided that information or would we need NodeXLPro be needed?

    ReplyDelete
    Replies
    1. Thanks Erik.
      The Basic version can display different edge thickness, opacity, style (eg dots and dashes) and colour. SO much possibility!

      Delete
  5. Thanks for this Shelley. The weekend has been filled with living family activities - I'll return to the dead on Monday.

    ReplyDelete
  6. Thanks Shelly. This opens new approach for me to handling the "BIG" data that genetic genealogy is generating. One question, is there a way to autofill the group label into a column in the Vertices tab?

    ReplyDelete
    Replies
    1. No way built into NodeXL (that I have found). You would need to use other methods. Probably the easiest would be to copy and paste the information from the Group Vertices tab into a new workbook. Add a column to that and put your own match ID into the new column all the way day. Import the new worksheet as you have before with the new column (filled with you) as Vertex 1, the "Vertex" column holding matchIDs as Vertex 2, and the Group Label as a Vertex 2 Property. This will do what you want but have the side effect of adding extra, unnecessary edges - use 'Prepare Data' which is located below the Import button to get rid of them. I hope that makes sense!

      Delete
    2. Thanks. Copy and paste is the approach I used to create a new workbook. The groups can then be searched to ID the branch of the tree. I hope. I am still working to understand the grouping algorithm.

      Delete
  7. Thank you for this excellent series! I'm a little confused, though. Why would a woman and her aunt be in separate groups? I know I don't understand the grouping algorithm, but that seems odd.

    ReplyDelete
    Replies
    1. Hi Cheryl, those algorithms weren't written with DNA connections in mind and we haven't added any information about the strength of the relationship between each pair of people to help them. Given that, they mostly seem to do a reasonable job but we've got to check the results, as you have.

      Delete
  8. Another thank you Shelley :)
    I am still getting my head round the whole DNA 'thing' but your series of posts is helping me so much.

    ReplyDelete