Blog post

Tuesday, August 22, 2017

Researching Abroad Roadshow–Canberra

Yesterday I attended the Researching Abroad Roadshow. Canberra’s event was one day only, with the British Isles and German/European streams running in different rooms.

I chose the British Isles stream, as it reflects my ancestry. We started the day with Scottish land records, and Scottish research resources before 1800 and after lunch moved on to Irish family history resources online and “Down and out in Scotland”.

When a speech focuses on types of records there’s a danger that the presenter will spend a lot of time rattling off lists. I’ve seen it happen before. Fortunately, this this was not the case yesterday. Chris Paton was an engaging speaker with plenty of examples that related the records back to the real people and events they describe.

I had looked at some of the resources that were covered before, but not in any depth, and others were completely new to me. I now feel that I have a head start on knowing where to look and what I might find when I’m ready to dig into Scottish and Irish research. Learning how to pronounce all those Scottish and Irish words might take a bit longer!

Chris very kindly indulged me with a quick selfie as he was racing off for the airport.

selfie

The Roadshow has two more stops, in Adelaide and in Perth. Get to it if you can!



Disclosure: In return for acting as a Roadshow Ambassador I received free entry to the event.


Wednesday, August 16, 2017

Visualising Ancestry DNA matches-Part 9-Combining kits

By now those of you playing along will have created a network analysis workbook using the NodeXL template, loaded your Ancestry DNA information, broken the tangle of matches into groups, experimented with the settings and found out how you could add additional relationships. Phew! See the index to previous posts if you’re just joining in.

Now the real fun begins!

A few readers have asked if it’s possible to combine kits together. The answer is Yes! Combining kits in one file is almost as easy as loading your own information, and can be very useful.

This post assumes that you manage more than one kit, or that the owner of another kit has provided you with their files. It also assumes that your kits aren’t so large that loading more information will make the file unworkable. Save before you try it. I manage two kits at present but you can add information for as many kits as you think your computer will handle.

I’ve loaded my kit and my father’s kit into one worksheet. A simple edit to the matches file before loading created a new column for my father’s sharedCM values.

image

I did some quick calculations to find out how many matches we have in common. I match 50% of my father’s 4th or closer cousins. Including all the distant cousins we have a combined total of 18,889 matches – only 15% of the grand total is shared. Exercise caution if adding distant cousins!

In-common-with file

The in-common-with file will add lines representing DNA connections to your graph.

Loading an in-common-with file will also add people who are related to the additional kit’s subject. If your goal is to research the family tree of the focus person (‘you’), the best kits to load are those belonging to relatives who have some of the same ancestors as you, but no ancestors that you don’t have.

Many of the people you ‘skipped’ are prime candidates:

  • Full siblings
  • Parents
  • Aunts and uncles
  • Grandparents

This doesn’t mean that you should never load the in-common-with file for someone who has ancestors you don’t. Combining a kit with a half sibling may help you work out which matches are ‘yours, mine, or ours’. 

If you don’t load the in-common-with file you can still load the matches file to place the sharedCM values side by side as I have.

Load the ICW file

Loading the in-common with file for additional kits is easy. Simply load it in exactly as you have done before.

  • NodeXL basic ribbon, Import button, From Open Workbook…
  • Select the file in the top box
  • Under Is Edge Column tick ‘matchid’ and ‘icwid’
  • Which edge column is Vertex 1: matchid
  • Which edge column is Vertex 2: icwid


Matches file

When loading matches for an additional kit the data loaded for shared matches will overwrite existing data.

The name and admin columns have the same information regardless of which kit they match so nothing is lost by reimporting these for another person. In fact, it’s better if you do import them, otherwise you won’t know who the new matches are. 

Columns such as range, sharedCM, note and matchURL differ from kit to kit. If you want to import any of these columns (I’d import sharedCM at minimum) you’ll need to make a few minor edits to the import file first.

Prepare the matches file

  • Open the match file m_AdditionalKitName.csv
  • Save a copy with a different name. m_AdditionalKitName_edited.csv will do.
  • The matchid, name and admin columns should be left alone.
  • For any other column you want to import, change the column header to indicate whose information it is.
    For example, ‘sharedCM’ might become ‘sharedCM John’. Keep it simple because next time you update the file you’ll need to enter it in exactly the same way.
  • Choose the first value in the testid column and change it to ‘zzz delete’. Then double click on the little square in the corner of the cell to copy it all the way down the sheet. This step isn’t strictly necessary but it only takes a few seconds and will make it easier to remove extra lines not needed for the graph.
    image
  • Save the file, but don’t close it yet.


Load the matches file

  • NodeXL basic ribbon, Import button, From Open Workbook…
  • Select the file in the top box
  • Under Is Edge Column tick ‘zzz delete’ and ‘matchid’
  • Under Is Vertex 2 Property Column tick:
    • name
    • admin
    • any other columns you wish to import (remember if the column name matches a column already present the information will be overwritten)
  • Which edge column is Vertex 1: zzz delete
  • Which edge column is Vertex 2: matchid


Remove unwanted matches

If you decided not to load the in-common with file, you may prefer to remove matches who don’t share DNA with you. You’ll find them at the bottom of the Vertices sheet. There won’t be any information in your own sharedCM column for those people.

Housekeeping

A few clean up tasks will make sure the graph is ready for more work.

Clean up the Edges

  • If you loaded an in-common-with file, remove duplicates (NodeXL ribbon, Prepare data button).
  • On the Edges worksheet, sort the Vertex 1 column from smallest to largest using the dropdown on the column header.
  • Filter the Vertex 1 column to only show ‘zzz delete’ entries.
    image
  • Highlight those lines and delete them.
  • Clear the filter afterwards.

Excel tips:

  • To quickly select a range of rows, select the top cell you want to include. With the Shift key held down, tap the End key and then the Down arrow.
  • To delete rows, move to the Home ribbon and click the Delete button. Choose either Delete Sheet Rows or Delete Table Rows.
    image

Clean up the Vertices

There should only be one row labelled  ‘zzz delete’ to get rid of and it will be at the very bottom of the Vertices sheet. Sort the column to find it if not. You can get rid of it, or just enter ‘Skip’.

Fix up the dot sizes

Earlier, we sized the dots according to the value in the sharedCM column so that we would have a visual indication of how close the relationship with the match is. Now that you have two (or more!) sharedCM columns it’s very likely that they are scattered with blank cells. All those dots will be the default dot size.

The easiest option is to set all the dots to the same size by using the Autofill columns button to clear the size column.

Personally, I prefer having larger and smaller dots. To fill in the blanks, I added a new column to the Vertices worksheet with a formula that returns the larger of the two sharedCM values. To do this I used the MAX function. The AVERAGE function might be a good option if you have loaded several siblings.

  • Add a column to the Vertices sheet by entering a new column heading in the first empty cell in the heading row. ‘New Size’ will do for a heading.
  • Select the first empty cell in the new column.
  • Move to the Home ribbon and change the cell format from ‘Text’ to ‘General’.
    image
  • Enter your preferred formula (see below if you need help). It should automatically fill in all the way down the table.

When you’re happy with the formula, use the Autofill columns button to transfer the content of your new column into the Vertex Size property.

Excel tip:

To enter the MAX or AVERAGE formula, start by typing in the formula name and an opening bracket:

=MAX( 

Then click on each cell (type a commas in between each click) that the calculation should use. You can enter as many elements as you want. Make sure you’re clicking in the same row as your formula. Finish off by entering a closing round bracket. It will look something like this:

=MAX([@sharedCM],[@[sharedCM Dad]])

Or type:

=MAX(AF3,AG3)

(check the cell references match your sheet).

Important note: Formulas and PC performance

Usually when you enter a formula in Excel it calculates so quickly that the result seems to pop up instantaneously. When you make a change in a worksheet any dependant cells (and their dependant cells and so on down the line) are recalculated in the blink of an eye.

We’ve just entered a formula all the way down a long table. This shouldn’t pose too much of a problem…. until it does. It might be when you run the grouping calculations again, or next time you load new data. With potentially tens of thousands of cells to recalculate those fractions of a second start to add up and Excel may stop responding.

There are two options to choose from that will lighten the load.

  1. Replace the formula with values: Highlight the column, Copy. Paste as values.
    image
    If you choose option 1, you’ll need to recreate the formulas when you load new data.

    OR

  2. Stop Excel from automatically calculating. You’ll find Calculation Options on the Formulas ribbon.
    If you do this you will need trigger recalculation of the worksheet yourself when required, either by pressing the Calculate Now button, or by pressing F9 on the keyboard.
    image
    The calculation choice will be saved with the worksheet. Be aware that any other worksheet that is open at the same time will also be affected, and the calculation choice saved for them as well. Also, the setting saved in the first workbook opened in any session is then applied to any other workbooks opened in the same session! It’s probably better to check the setting before you do anything with heavy calculations… and…. if you choose this option, remember what you have done! Formulas may look like they are working when you fill them in, but they won’t calculate correctly until you press F9.
    (In practice it’s not all quite so troublesome as it sounds).

Run clustering calculations

Did you read the important note about PC performance? Hopefully one column of formulas won’t be too much of a strain, but if you have any doubt please take one of the actions above, just in case!

Re-run the clustering algorithm of your choice and lay the graph out once more.

Explore!

In the next post I’ll show you how to colour code your matches.

Friday, August 4, 2017

Visualising Ancestry DNA matches-Part 8-Adding known ancestors

Ready for the next step? If you need to catch up, refer to the index to find your way.

So far all of the dots on the graph represent individuals, and the lines represent (believed) DNA connections. What if we expanded our idea of what the dots on the graph could represent to include ancestral couples? Then we could draw lines (which still represent DNA linkages) between matches and their known ancestors.

Example

imageJohn Tregonning and Mary Isaac are my 3xgreat-grandparents. They are also known ancestors for one of my matches. I’ve added a marker for this ancestral pair, and a line connecting their other known descendant to the marker.

I noticed that one of the other matches in the same group descended from a David Isaac – the surname caught my eye. Through a combination of building trees up and down, and by contacting private and no-tree owners, I learned that at least five matches from this group descend from David Isaac and Maryann Coomb via various of their children. I decided to also add David Isaac and Maryann Coomb to my graph as it seems likely that I have some sort of DNA connection to them.

In a perfect world where everyone had complete public trees with consistent spelling, David Isaac and Maryann Coomb should appear on Ancestry as “New Ancestor Discoveries” (except that in a perfect world they would be “New Relative Discoveries”). It’s not a perfect world and I don’t expect that kind of hint to pop up on Ancestry any time soon.

Using the graph this way helps me to not only find that information but to keep track of and visualise what I’ve found.

Adding the information

Although you can add people and relationships directly to the graph file I prefer to compile the information in a separate file (the Additional Input file) and then import it. If something goes wrong it’s much easier to delete some lines, correct a small file and reload than to unscramble a file with ten of thousands of rows.

I’ve provided instructions for both methods. I find that compiling the Ancestry match IDs is the most difficult part of the process – I’ve also provided some instructions for a shortcut that may help in making the match ID list.

Method 1: Additional Input file method

Enter the following information in the Additional Input file:

  • matchid : match’s AncestryID
  • Match name : match’s name (for reference only, not loaded)
  • Match admin : match’s admin (for reference only, not loaded)
  • Vertex 2 : ancestor’s name eg ‘John Tregonning and Mary Isaac’
    If you enter the same ancestor(s) for multiple matches, make sure the spelling, punctuation and spaces are exactly the same each time.
  • Name : as for Vertex 2
  • Vertex Type : ‘Ancestor’
  • Edge Type : ‘Ancestor’
  • If you would like to be able to apply labels for only ancestors (not for everyone) add an extra column to the file called Ancestor Label and enter their names in that column as well. image

There is some repetition here, but it will give us flexibility to do other things later.

When you import the file (NodeXL Basic ribbon, Import button, From Open Workbook…. option) choose the following options:

  • Columns have headers box should be ticked.
  • Under Is Edge Column select these (and no others)
    • matchid
    • Vertex2
    • Edge type
  • Under Is Vertex 2 Property Column select these (and no others)
    • Name
    • Vertex Type
    • Visibility (not necessary if you don’t need to update the ‘Skip’ lines for anyone)
    • Ancestor label
  • Which edge column is Vertex 1? dropdown ‘matchid’
  • Which edge column is Vertex 2? dropdown ‘Vertex 2’

Rerun the grouping and refresh the graph to see the new elements.


Method 2: Direct entry method

To add points to the graph manually you will need to add a row on the Edges worksheet for each DNA connection you want to make. That row needs two identifiers: one for the match and one for the ancestor(s). 

  • Move to the bottom of the Edges worksheet (see tip below)
  • Enter the Ancestry ID for your DNA match in a new row under the Vertex 1 column.
  • The second identifier (Vertex 2 column) should be an identifier for the known ancestor(s). Since they don’t already have an identifier just use their names – eg ‘John Tregonning and Mary Isaac’.

It doesn’t matter which identifier is Vertex 1 and which is Vertex 2, this just happens to be the convention I’ve settled on. That’s enough to create the relationship. When you refresh the graph a new row will automatically be created on the Vertices worksheet.

A little extra information will help us find those lines again if we need to and will give us more flexibility later.

  • On the Edges worksheet:
    • Add a column called Edge Type, and set the value to ‘Ancestor’ for these matches.
      image
  • On the Vertices worksheet,
    • If you haven’t refreshed the graph yet create a line for each Ancestral pair, then
    • Add the ancestor identifier (ie their names) to the Vertex column AND the Name column.
    • Add a column called Vertex Type and set the value to ‘Ancestor’ for the appropriate rows.
    • If you would like to be able to apply labels for only ancestors (not for everyone) then add another column called Ancestor Label to the Vertices worksheet and enter the ancestor identifier (ie their names) there as well.
      image

When you’re trying to link data, spelling and punctuation matter! Make sure that you enter the ancestor names 100% consistently across your matches and the two sheets.

Rerun the grouping and refresh the graph to see the new elements.

Excel tips:

To add a column, just type a label that will become the column header in the first empty cell in row 2.

To quickly move all the way to the bottom of a full column: Select any cell in the column. On your keyboard tap the End button and then the down arrow.

Shortcut for assembling Ancestry match IDs

I find that the hardest part is assembling all those Ancestry match IDs. You may be able to speed up the process by extracting the list of match IDs from your match list.

  • If using the Additional Input file (or refer to Part 2 to create one), open it up so that it is ready and waiting.
  • Open the matches file “m_YourName.csv”
  • Select any cell within the table area. On the Insert ribbon, click Table.
    image
  • The appropriate range will be automatically selected. Make sure My table has headers is checked, and click OK.
    image
  • The appearance of the table will change and drop down filters will appear on each column header.
  • Use the drop down on the Hint column to filter for matches with a shared ancestor hint.
    image
  • Click and drag (or click and Shift-Click) to highlight all the visible rows for the matchid, name and admin columns.
  • Copy
    image
  • Switch back to the Additional input file and Paste these into the first available empty cell under matchid.
    image

Fill in the other columns as above.

Additional tip: You could filter the list to see details for people with notes, or who have the value TRUE in the ‘starred’ column, depending on how you’ve been using these.

Formatting and labelling

We added a column called Ancestor Label which contained duplicated name information. The purpose of this was to allow you to leave name labels off for your matches, but show them for ancestors if you wish. To apply the name labels use the Autofill Columns button.

Labelling tip: If you want to remove existing labels, click the arrow next to the drop down and you will find an option to clear the label column (you won’t see the change until you refresh the graph). image

I’ve applied different formatting to the Ancestor markers and lines so that it will be clear to me what they are. We’ll go into other methods in a future post – but for now you can alter them using the same method as described in the previous post.

  • Select any rows on the Vertices worksheet that contain ancestors (it may be helpful to sort the Vertex Type column if they are not all together).
  • Right click a highlighted line on the chart to access the right click menu.
  • Click Edit Selected Edge Properties… for line formatting options.
  • Select the rows again if you need to.
  • Right click a highlighted dot to access the right click menu again and click Edit Selected Vertex Properties… for marker formatting options
    OR
    Make the changes using buttons on the NodeXL ribbon.
    image

I set the edge Style to ‘dot’, and the vertex Shape to ‘label’ in the example at the start of this post.

Applying the marker changes

If you’ve been following along, you’ll find that the Edge colour changes work, but Vertex colour and shape changes don’t. There’s a setting that will fix that.

To use your selected Vertex colours and shapes:

  • Select the Groups dropdown on the NodeXL Basic ribbon.
    image
  • You’ll see an options box that directs NodeXL Basic whether to use colours and shapes from the Groups sheet, or to take them from the Vertices worksheet. If you use colours from the Vertices worksheet you’ll lose the rainbow of group colours but gain the ability to choose your own colours point by point. Shapes work similarly.
  • I elected to keep the bright group colours for now.
  • I wanted to change the shape of the marker so I changed the option under What shapes should be used for the groups’ vertices? and clicked OK.
    image

More ideas, and next steps

If you’re feeling adventurous, you might like to try adding points for non-person information such as a particular place, an unusual surname, or even an ethnicity. I’ve played with doing this. It worked quite well if the value being linked was uncommon  (‘Smith’ was a disaster!!) but ultimately I decided that colour coding these values (coming soon!) worked better for me.

The next posts are the ones that I’m really excited about showing you! They’re what I’ve been building to all this time. First we’re going to think about combining the kits we manage. Then we’ll move on to colour coding – I’ll show you how to set up colour coding schemes and switch between them at will.