Inferring constraint networks

Constraint networks can be inferred with any method of your choosing, such as SNaQ, PhyNEST, PhyloNet, or NANUQ (implemented in the MSCquartets R package). We utilize SNaQ because it is also implemented in Julia, so the workflow can all be contained in a single, straight-forward script.

For each subset, we need to perform the following in order to infer a network with SNaQ:

  1. Prune the estimated gene trees to only include taxa in the subset
  2. Generate SNaQ's input data
  3. Run SNaQ

[!NOTE]

When using SNaQ, it is best practice to (1) set the runs parameter to a large value (at least 10) and (2) infer networks with numbers of hybrids $h=0,1,2,3,...$ before performing model selection to determine the "best" network. For simplicity, here we only perform 1 run for each inferred network and we only infer networks with $h=1$.

# First, make a folder to put the data in
isdir("snaq_data") || mkdir("snaq_data")

# We will store the inferred networks in this array
snaq_networks = Array{HybridNetwork}(undef, length(subsets))

for (i, subset) in enumerate(subsets)
    @info "Inferring network for subset $(i)/$(length(subsets))"

    # 1. Prune estimated gene trees
    subset_gts = prune_networks(est_gts, subset)

    # 2. Generate SNaQ's input data
    df = readtrees2CF(subset_gts, writeTab=false, writeSummary=false)

    # 3. Run SNaQ
    snaq_networks[i] = snaq!(subset_gts[1], df, hmax=1, filename="snaq_data/snaq_subset$(i)", runs=1, seed=42)
end