Twigs of Yore: Visualising Ancestry DNA matches

Blog post

Saturday, July 15, 2017

Visualising Ancestry DNA matches–Part 4-Updating

This is the fourth part of a series about using a free Excel template, NodeXL Basic, to visualise Ancestry DNA match lists. For previous posts, see the index to the series.

So you’ve made a chart and it’s just the way you want it. The next thing you know you have brand new interesting DNA matches and you’re dying to know how they fit in. Once you get used to it, creating a new graph is quite quick and easy. Or…. you could save a little time and upload more information into the same chart.

How new data is treated

Before you load more information into an existing chart it’s useful to understand what will happen.

Matches

New matches are appended to the end of the Vertices sheet.
Data for existing matches is overwritten (where the columns have the same name).
Columns that were not present before are added.
Columns that were present before that are not re-imported are unaffected. The existing data will still be there. New matches will have no values in that column.
Matches who don’t appear in the new data are unaffected – you won’t lose them.

Relationships

All the relationships in the new file will be added to the end of the list on the Edges sheet.
This will duplicate existing relationships. Clean up is necessary.

This has some exciting implications for the kinds of things we will be able to do with the charts. I’m dying to get up to the posts where I’ll show them to you! One thing at a time…

Clearing duplicates

To make things easier on your computer, you should clear away duplicate relationship entries (edges) each time you reload data into an existing worksheet:

On the NodeXL ribbon select Prepare Data, Count and Merge Duplicate Edges
Clear the Count duplicate edges box
Tick the Merge duplicate edges box
Select Vertex 1 and Vertex 2 as the columns that contain duplicates.
Click OK.

If you have a long in-common-with list you should take this step now, even if you haven’t reimported data.

When you first import data, one ‘Vertex’ row is created for each person, and one ‘Edge’ for each relationship. For fourth and closer cousins who match each other, two relationships are recorded: ‘A matches B’ and ‘B matches A’. Excel’s Remove duplicates button won’t help because it doesn’t recognise these pairs of reciprocal relationships as duplicates. Each duplicated relationship adds a little to the processing load for your computer.

When you clear duplicates, the first duplicate entry starting from the top is kept and any others discarded. Keep this in mind if you use any of the other edge columns – you may want to choose “Vertex 1, Vertex 2 and this column”, or to sort the edges before clean up so that the ones you want to retain are at the top.

Updating data – the process

To update a worksheet with fresh data, use the following steps.

Save a copy first in case something goes wrong.

Import new in-common-with file – see Part 2
Import new match file – see Part 2
Clear duplicates – see above.
Recreate groups – see Part 2
Autofill columns – see Part 3

If the autofill process hasn’t caused your graph to refresh, then Refresh Graph now.

Note: If one or more of your new matches has a very close relationship to you and should be ‘skipped’, add them to the Additional Input file and reload that as well (see Part 2 for a reminder how to load it). Do so any time before step 3 above. Alternatively you can enter ‘Skip’ manually on the Vertices sheet.

Coming up…

I know a lot of people are interested in ways to work with a busy chart – that post will be next, I promise!

9 comments:

UnknownJuly 16, 2017 at 2:11 AM
This method is seemingly magical. The clusters contain not only people who I knew should be grouped, but many who previously appeared to fit nowhere at all. NodeXL, combined with your expertise, seems to have created information out of thin air!
ReplyDelete
Replies
KerryJuly 16, 2017 at 8:17 AM
Thank you, thank you, thank you! I have a fairly endogamous group, so my graph is a bit messy, but the outliers are who I wanted to indentify, and this is a huge help.
ReplyDelete
Replies
UnknownJuly 19, 2017 at 7:04 AM
Shelley- Since the names on a dense graph fall all over each other, even Zoomed in with scale set to small, it is hard to see the most important result of this project: which matches fall into each cluster, and therefore are related to each other. Obviously, NodeXL knows which vertices go in each cluster. It also knows what color they are, etc. There must be a way to extract cluster lists from the spreadsheet. Have you done that, or do you know how to do it?

Thanks!
ReplyDelete
Replies
UnknownJuly 20, 2017 at 3:47 PM
Hi Shelley
I love your information & have almost finished setting up graphs for all of my DNA kits (I have a few). I have played with the imports & have added the 'hint' to my import - this has been great as I can group them all together to start 'seeing' where the family groups are. I have also started reordering the CM size from largest to smallest, this has been good. I also started adding the Skip function to my data once as a graph to weed out known maternal or paternal sides, also very interesting.
Looking forward to future posts.
Naomi
ReplyDelete
Replies
Working MumJanuary 5, 2019 at 4:30 AM
Hi Shelley,

My husband and I have been having fun with this, but I seem to have hundreds of matches that are not in any of the clusters - or at least I can't see them. We wondered if they are left off if they are not clustered with anyone or if the match was tiny. Is there any way (I'll try the suggestion above?) to see which groups these 'stragglers' are linked to? Typically, a lot of them have the best trees! Cheers, Ali
ReplyDelete
Replies

Pages