Package 'alphabetr' reference manual

Title:	Algorithms for High-Throughput Sequencing of Antigen-Specific T Cells
Description:	Provides algorithms for frequency-based pairing of alpha-beta T cell receptors.
Authors:	Edward Lee [aut, cre]
Maintainer:	Edward Lee <[email protected]>
License:	AGPL (>= 3)
Version:	0.2.2
Built:	2025-03-06 04:04:09 UTC
Source:	https://github.com/edwardslee/alphabetr

Identify candidate alpha/beta pairs.

Description

bagpipe() uses the alphabetr resampling procedure on sequencing data to identify candidate alpha/beta pairs. The procedure takes a subsample of the data without replacement, calculates association scores with chain_scores, and then for each well, uses the Hungarian algorithm to determine the most likely pairings for the chains found in the well. Each time this is done is a replicate, and the number of replicates is specified as an option. A threshold is then used to filter the candidate pairs that appear in proportion of the replicates larger than the threshold, resulting in the final list of candidate pairs. Bagpipe is an acronym for bootstrapping alphabetr generated pairs procedure (based on older versions that utilized bootstrapping)

Usage

bagpipe(alpha, beta, replicates = 100, frac = 0.75, bootstrap = FALSE)
bagpipe(alpha, beta, replicates = 100, frac = 0.75, bootstrap = FALSE)

Arguments

`alpha`	Matrix recording which alpha chains appear in each well of the data. See `create_data`.
`beta`	Matrix recording which beta chains appear in the each well of the data. See `create_data`.
`replicates`	The number of times the resampling procedure is repeated, i.e. the number of replicates. At least 100 replicates is recommended.
`frac`	The fraction of the wells resampled in each replicate. Default is 75% of the wells
`bootstrap`	Legacy option. Calls a bootstrapping strategy (which resamples with replacement) instead of sampling a subset without replacement.

Value

A n x 5 matrix where n is the number of clones determined by bagpipe(). Each row represents the chains of the clone. Columns 1 and 2 represent the beta chain indices of the clone. Columns 3 and 4 represent the alpha chain indices of the clone. Column 5 represents the number of replicates that this clone was found in. Column 5 is used to filter our the clones that have not be determined by a certain "threshold" proportion of replicates.

Note that column 1 == column 2 and column 3 == column 4 since bagpipe() does not try to determine dual-alpha or dual-beta clones.

Examples

 # see the help for create_clones() and create_data()
 clones <- create_clones(numb_beta = 1000,
                         dual_alpha = .3,
                         dual_beta = .06,
                         alpha_sharing = c(0.80, 0.15, 0.05),
                         beta_sharing  = c(0.75, 0.20, 0.05))
 dat <- create_data(clones$TCR, plate = 5,
                    error_drop = c(.15, .01),
                    error_seq  = c(.05, .001),
                    error_mode = c("lognormal", "lognormal"),
                    skewed = 10,
                    prop_top = 0.6,
                    dist = "linear",
                    numb_cells = matrix(c(50, 480), ncol = 2))

 ## Not run: 
 # normally want to set replicates to 100 or more
 pairs <- bagpipe(alpha = dat$alpha,
               beta  = dat$beta,
               replicates = 5,
               frac = 0.75,
               bootstrap = FALSE)

 # using a threshold of 0.3 of replicates
 pairs <- pairs[pairs[, 5] > 0.3, ]
 
## End(Not run)
# see the help for create_clones() and create_data()
 clones <- create_clones(numb_beta = 1000,
                         dual_alpha = .3,
                         dual_beta = .06,
                         alpha_sharing = c(0.80, 0.15, 0.05),
                         beta_sharing  = c(0.75, 0.20, 0.05))
 dat <- create_data(clones$TCR, plate = 5,
                    error_drop = c(.15, .01),
                    error_seq  = c(.05, .001),
                    error_mode = c("lognormal", "lognormal"),
                    skewed = 10,
                    prop_top = 0.6,
                    dist = "linear",
                    numb_cells = matrix(c(50, 480), ncol = 2))

 ## Not run: 
 # normally want to set replicates to 100 or more
 pairs <- bagpipe(alpha = dat$alpha,
               beta  = dat$beta,
               replicates = 5,
               frac = 0.75,
               bootstrap = FALSE)

 # using a threshold of 0.3 of replicates
 pairs <- pairs[pairs[, 5] > 0.3, ]
 
## End(Not run)

Calculate association scores between alpha and beta chain pairs.

Description

chain_scores() calculates association scores between every pair of alpha and beta chains based on the number of concurrent well appearances each alpha and beta pair makes, scaled inversely by the number of unique chains in that well. See Lee et. al. for more information about this procedure.

Usage

chain_scores(data_a, data_b)
chain_scores(data_a, data_b)

Arguments

`data_a`	Matrix recording which alpha chains appear in each well of the data. See `create_clones`.
`data_b`	Matrix recording which beta chains appear in the each well of the data. See `create_clones`.

Value

A list containing the alpha and beta association scores. Accessed with list$ascores and list$bscores respectively.

Examples

 # see the help for create_clones() and create_data()
 clones <- create_clones(numb_beta = 1000,
                         dual_alpha = .3,
                         dual_beta  = .06,
                         alpha_sharing = c(0.80, 0.15, 0.05),
                         beta_sharing  = c(0.75, 0.20, 0.05))
 dat <- create_data(clones$TCR, plate = 5,
                    error_drop = c(.15, .01),
                    error_seq  = c(.05, .001),
                    error_mode = c("lognormal", "lognormal"),
                    skewed = 10,
                    prop_top = 0.6,
                    dist = "linear",
                    numb_cells = matrix(c(50, 480), ncol = 2))

 #this is done internally in bagpipe()
 scores <- chain_scores(data_a = dat$alpha, data_b = dat$beta)
 scores <- scores$ascores + t(scores$bscores)

# see the help for create_clones() and create_data()
 clones <- create_clones(numb_beta = 1000,
                         dual_alpha = .3,
                         dual_beta  = .06,
                         alpha_sharing = c(0.80, 0.15, 0.05),
                         beta_sharing  = c(0.75, 0.20, 0.05))
 dat <- create_data(clones$TCR, plate = 5,
                    error_drop = c(.15, .01),
                    error_seq  = c(.05, .001),
                    error_mode = c("lognormal", "lognormal"),
                    skewed = 10,
                    prop_top = 0.6,
                    dist = "linear",
                    numb_cells = matrix(c(50, 480), ncol = 2))

 #this is done internally in bagpipe()
 scores <- chain_scores(data_a = dat$alpha, data_b = dat$beta)
 scores <- scores$ascores + t(scores$bscores)

Combines the frequency estimation results from single TCR clones and dual TCR clones

Description

combine_freq_results() combines the results of the frequency estimation performed on single TCR clones (from the output of bagpipe) and the frequency estimation performed on dual clones. The code will find the rows of the single TCR frequency results that are represented by the dual clones and replace them with the appropriate dual clone entry.

Usage

combine_freq_results(single, dual)
combine_freq_results(single, dual)

Arguments

`single`	Frequency estimation results of single TCR clones (usually from the first time `freq_estimate` is called)
`dual`	Frequency estimation results of dual TCR-alpha clones

Value

A data.frame with the same structure as the output of freq_estimate. If two single "clones" in the single data.frame is represented by a dual clone in dual, then it is removed and replaced with one row represented by the dual clone.

Create a synthetic set of clones with a specific underlying clonal structure

Description

create_clones() creates a set of (beta1, beta2, alpha1, alpha2) quadruples that represent the indices of the chains of clones. The function will take a fixed number of unique beta chains that are in the T cell population, and then use the degree of beta and alpha sharing to determine the number of unique alpha chains in the populations. These chains will then be randomly assigned to each other, with a proportion of them being dual TCR clones (i.e. alpha1 != alpha2 and/or beta1 != beta2), forming our random list of clones with their chain indices.

Usage

create_clones(numb_beta, dual_beta, dual_alpha, alpha_sharing, beta_sharing)
create_clones(numb_beta, dual_beta, dual_alpha, alpha_sharing, beta_sharing)

Arguments

`numb_beta`	The number of unique betas in the clonal population
`dual_beta`	The proportion of clone that are dual TCRbeta clones, i.e. has two distinct beta chains
`dual_alpha`	The proportion of clones that are dual TCRalpha clones, i.e. has two distinct alpha chains
`alpha_sharing`	A vector where the ith position represents the proportion of alpha chains that are shared by i clones; alpha chains can be shared by up to 7 clones
`beta_sharing`	A vector where the ith position represents the proportion of beta chains that are shared by i clones; beta chains can be shared by up to 5 clones

Value

A list of four different matrices. Each matrix is has dimensions n x 4, where n is the total number of clones and each row represents the chains of a clone. Column 1 and column 2 are the beta index/indices of the beta chain(s) used by the clone. Column 3 and 4 are the alpha index/indices of the alpha chain(s) used by the clone. If a clone has a single beta chain, then col 1 and col 2 will be equal. If a clone has a single alpha chain, then col 3 and col 4 will be equal.

Examples

 # Creating a population containing 1000 beta chains; 10% of clones with
 # dual-beta TCRs and 30% of clones with dual TCRs; 75% beta shared by one
 # clone, 20% by two clones, 5% by three clones; 80% alpha chains shared by
 # one clone, 15% by two clones, and 5% by three clones

 clones <- create_clones(numb_beta = 1000,
                         dual_alpha = .3,
                         dual_beta  = .06,
                         alpha_sharing = c(0.80, 0.15, 0.05),
                         beta_sharing  = c(0.75, 0.20, 0.05))

# Creating a population containing 1000 beta chains; 10% of clones with
 # dual-beta TCRs and 30% of clones with dual TCRs; 75% beta shared by one
 # clone, 20% by two clones, 5% by three clones; 80% alpha chains shared by
 # one clone, 15% by two clones, and 5% by three clones

 clones <- create_clones(numb_beta = 1000,
                         dual_alpha = .3,
                         dual_beta  = .06,
                         alpha_sharing = c(0.80, 0.15, 0.05),
                         beta_sharing  = c(0.75, 0.20, 0.05))

Simulate sequencing data obtained from the alphabetr approach with a specified clonal structure

Description

create_data() simulates an alphabetr sequencing experiment by sampling clones from a clonal structure specified by the user. The clones are placed on a frequency distribution where a fixed number clones represents the top proportion of the population in frequency and the other clones represent the rest of the population in frequency. Different error models and different sampling strategies can be simulated as well.

Usage

create_data(TCR, plates, error_drop, error_seq, error_mode, skewed, prop_top,
  dist = "linear", numb_cells)
create_data(TCR, plates, error_drop, error_seq, error_mode, skewed, prop_top,
  dist = "linear", numb_cells)

Arguments

`TCR`	The specified clonal structure, which can be created from `create_clones`.
`plates`	The number of plates of data.
`error_drop`	A vector of length 2 with the mean of the drop error rate and the sd of the drop error rate.
`error_seq`	A vector of length 2 with the mean of the in-frame error rate and the sd of the in-frame error rate.
`error_mode`	A vector of two strings determining the "mode" of the error models. The first element sets the mode of the drop errors, and the second element sets the mode of the in-frame errors. The two modes available are "constant" for a constant error rate and "lognormal" for error rates drawn from a lognormal distribution. If the mode is set to "constant" the sd specified in `error_drop` and/or `error_seq` will be ignored.
`skewed`	Number of clones represent the top proportion of the population by frequency (which is specified by `prop_top`).
`prop_top`	The proportion of the population in frequency represented by the number of clones specified by `skewed`.
`dist`	The distribution of frequency of the top clones. Currently only "linear" is available.
`numb_cells`	A two column matrix determining the sampling strategy of the experiment. The first column represents the number of cells per well, and the second column represents the number of wells with that sample size. The sum of column 2 should equal 96 times the number of plates.

Value

A list of length 2. The first element is a matrix representing the data of the alpha chains, and the second element is a matrix representing the data of beta chains. The matrix represents the sequencing data by representing the wells of the data by rows and the chain indices by column. Entry [i, j] of the matrix represents if chain j is found in well i (yes == 1, no == 0). e.g. if alpha chain 25 is found in well 10, then [10, 25] of the alpha matrix will be 1.

Examples

 # see the help for create_clones() for details of this function call
 clones <- create_clones(numb_beta = 1000,
                         dual_alpha = .3,
                         dual_beta = .06,
                         alpha_sharing = c(0.80, 0.15, 0.05),
                         beta_sharing  = c(0.75, 0.20, 0.05))

 # creating a data set with 5 plates, lognormal error rates, 10 clones
 # making up the top 60% of the population in frequency, and a constant
 # sampling strategy of 50 cells per well for 480 wells (five 96-well plates)
 dat <- create_data(clones$TCR, plate = 5,
                    error_drop = c(.15, .01),
                    error_seq  = c(.05, .001),
                    error_mode = c("lognormal", "lognormal"),
                    skewed = 10,
                    prop_top = 0.6,
                    dist = "linear",
                    numb_cells = matrix(c(50, 480), ncol = 2))

# see the help for create_clones() for details of this function call
 clones <- create_clones(numb_beta = 1000,
                         dual_alpha = .3,
                         dual_beta = .06,
                         alpha_sharing = c(0.80, 0.15, 0.05),
                         beta_sharing  = c(0.75, 0.20, 0.05))

 # creating a data set with 5 plates, lognormal error rates, 10 clones
 # making up the top 60% of the population in frequency, and a constant
 # sampling strategy of 50 cells per well for 480 wells (five 96-well plates)
 dat <- create_data(clones$TCR, plate = 5,
                    error_drop = c(.15, .01),
                    error_seq  = c(.05, .001),
                    error_mode = c("lognormal", "lognormal"),
                    skewed = 10,
                    prop_top = 0.6,
                    dist = "linear",
                    numb_cells = matrix(c(50, 480), ncol = 2))

Simulate sequencing data obtained single-cell sequencing

Description

create_data_singlecells() simulates a single-cell sequencing experiment by sampling clones from a clonal structure specified by the user and using the same error models and frequency distributions used in create_data. These functions are almost identical except this one simulates the sampling and sequencing of single T cells.

Usage

create_data_singlecells(TCR, plates = 5, error_drop = c(0.15, 0.01),
  error_seq = c(0.05, 0.01), error_mode = c("constant", "constant"),
  skewed = 15, prop_top = 0.5, dist = "linear")
create_data_singlecells(TCR, plates = 5, error_drop = c(0.15, 0.01),
  error_seq = c(0.05, 0.01), error_mode = c("constant", "constant"),
  skewed = 15, prop_top = 0.5, dist = "linear")

Arguments

`TCR`	The specified clonal structure, which can be created from `create_clones`.
`plates`	The number of plates of data. The number of single-cells is 96 times `plates`.
`error_drop`	A vector of length 2 with the mean of the drop error rate and the sd of the drop error rate.
`error_seq`	A vector of length 2 with the mean of the in-frame error rate and the sd of the in-frame error rate.
`error_mode`	A vector of two strings determining the "mode" of the error models. The first element sets the mode of the drop errors, and the second element sets the mode of the in-frame errors. The two modes available are "constant" for a constant error rate and "lognormal" for error rates drawn from a lognormal distribution. If the mode is set to "constant" the sd specified in `error_drop` and/or `error_seq` will be ignored.
`skewed`	Number of clones represent the top proportion of the population by frequency (which is specified by `prop_top`).
`prop_top`	The proportion of the population in frequency represented by the number of clones specified by `skewed`.
`dist`	The distribution of frequency of the top clones. Currently only "linear" is available.

Value

A list of length 3. The first element is a matrix representing the data of the alpha chains ($alpha), and the second element is a matrix representing the data of beta chains ($beta). The matrix represents the sequencing data by representing the wells of the data by rows and the chain indices by column. Entry [i, j] of the matrix represents if chain j is found in well i (yes == 1, no == 0). e.g. if alpha chain 25 is found in well 10, then [10, 25] of the alpha matrix will be 1.

The third element of the list ($drop) is a matrix that records the index of the clone sampled in the well (col 1), records if a drop error occurred (col 2), and record if an in-frame error occurred (col 3).

Examples

 # see the help for create_clones() for details of this function call
 clones <- create_clones(numb_beta = 1000,
                      dual_alpha = .3,
                      dual_beta  = .06,
                      alpha_sharing = c(0.80, 0.15, 0.05),
                      beta_sharing  = c(0.75, 0.20, 0.05))

 # creating a data set with 480 single cells, lognormal error rates, 10 clones
 # making up the top 60% of the population in frequency, and a constant
 # sampling strategy of 50 cells per well for 480 wells (five 96-well plates)
 dat <- create_data_singlecells(clones$TCR, plate = 5,
                                error_drop = c(.15, .01),
                                error_seq  = c(.05, .001),
                                error_mode = c("lognormal", "lognormal"),
                                skewed = 10,
                                prop_top = 0.6,
                                dist = "linear")


# see the help for create_clones() for details of this function call
 clones <- create_clones(numb_beta = 1000,
                      dual_alpha = .3,
                      dual_beta  = .06,
                      alpha_sharing = c(0.80, 0.15, 0.05),
                      beta_sharing  = c(0.75, 0.20, 0.05))

 # creating a data set with 480 single cells, lognormal error rates, 10 clones
 # making up the top 60% of the population in frequency, and a constant
 # sampling strategy of 50 cells per well for 480 wells (five 96-well plates)
 dat <- create_data_singlecells(clones$TCR, plate = 5,
                                error_drop = c(.15, .01),
                                error_seq  = c(.05, .001),
                                error_mode = c("lognormal", "lognormal"),
                                skewed = 10,
                                prop_top = 0.6,
                                dist = "linear")

Calculate likelihood of two beta-sharing candidate alpha-beta pairs deriving from a dual clone

Description

dual_discrim_dual_likelihood() is used within dual_top to calculate the likelihood that two alpha-beta pairs identified by bagpipe sharing the same beta chain derive from a single dual-alpha clone (instead of two distinct clones sharing the same beta).

Usage

dual_discrim_dual_likelihood(est, err, numb_cells, numb_wells, binomials)
dual_discrim_dual_likelihood(est, err, numb_cells, numb_wells, binomials)

Arguments

`est`	Frequency estimate of the putative dual-alpha clone
`err`	Mean drop error rate
`numb_cells`	Vector containing the number of cells per well
`numb_wells`	Vector containing the number of wells with the sample sizes given by `numb_cells`
`binomials`	Calculations of the needed binomial coefficients; this is faster in R than in Rcpp (from my own tests)

Value

A numeric containing the negative log likelihood

Calculate likelihood of two beta-sharing candidate alpha-beta pairs deriving from a dual clone

Description

dual_discrim_shared_likelihood() is used within dual_top to calculate the likelihood that two alpha-beta pairs identified by bagpipe sharing the same beta chain derive from a two distinct clones sharing the same beta chain dual-alpha clone (instead of a single dual-alpha clone)

Usage

dual_discrim_shared_likelihood(est1, est2, err, numb_cells, numb_wells,
  binomials, multinomials)
dual_discrim_shared_likelihood(est1, est2, err, numb_cells, numb_wells,
  binomials, multinomials)

Arguments

`est1`	Frequency estimate of the first alpha-beta clone
`est2`	Frequency estimate of the second alpha-beta clone
`err`	Mean drop error rate
`numb_cells`	Vector containing the number of cells per well
`numb_wells`	Vector containing the number of wells with the sample sizes given by `numb_cells`
`binomials`	Calculations of the needed binomial coefficients; this is faster in R than in Rcpp (from my own tests)
`multinomials`	Calculations of the needed multinomial coefficients; this is way faster in R due to vectorization

Value

A numeric containing the negative log likelihood

Calculate dual depths and false dual rates for simulated alphabetr experiments

Description

dual_eval() is used in simulation situations to compare the duals determined by dual_top and dual_tail (which can be combined with rbind()) to the duals in the simulated T cell population.

Usage

dual_eval(duals, pair, TCR, number_skewed, TCR_dual)
dual_eval(duals, pair, TCR, number_skewed, TCR_dual)

Arguments

`duals`	A 4 column matrix (col 1 + 2 = beta indices, col 3 + 4 = alpha indices) containing the indices of dual-alpha clones. The output of `dual_top` and `dual_tail` are in this form (and the outputs of these two functions can combined by using `rbind()`)
`pair`	The output of `bagpipe`
`TCR`	The clonal structure of the simulated T cell population. This is obtained by subsetting the `TCR` element of the output of `create_clones`
`number_skewed`	The number of clones represent the top proportion of the T cell population by frequency (this is the same `number_skewed` argument used when `create_clones` is called)
`TCR_dual`	The dual clones of the simulated T cell population. This is obtained by subsetting the `dual_alph` element of the output of `create_clones`

Value

A data.frame with the following columns:

fdr, the false dual rate
numb_cand_duals, the number of duals identified
adj_depth_top, the adjusted dual depth of top clones
abs_depth_top, the absolute dual depth of top clones
numb_correct_top, the number of correctly identified dual clones in the top
numb_duals_ans_top, the number of top dual clones in the simulated T cell population
numb_poss_top, the number of top dual clones whose beta and both alpha chains were identified by bagpipe()
numb_unestimated_top, number of top dual clones whose frequencies could not be calculated (usually because the clones appeared in every well of the data)
adj_depth_tail, the adjusted dual depth of tail clones
abs_depth_tail, the absolute dual depth of tail clones
numb_correct_tail, the number of correctly identified tail clones
numb_duals_ans_tail, the number of dual tail clones in the simulated T cell population
numb_poss_tail, the number of tail dual cloens whose beta and both alpha chains were identified by bagpipe()
numb_unestimated_tail, the number of tail clones whose frequencies could not be calculated

Discriminate between beta-sharing clones and dual-alpha TCR clones (optimized for rare clones)

Description

dual_tail() distinguishes between clones that share a common beta chain and dual TCR clones with two productive alpha chains. The procedure tests the null hypothesis that two candidate alpha, beta pairs with the same beta represent two separate clones by using the frequency estimates to calculate the number of wells that both clones are expected to be in. This is compared to the actual number of wells that both clones appear in, and if the actual number is greater than the expected number, than the pairs are chosen to represent a dual TCR clone.

Usage

dual_tail(alpha, beta, freq_results, numb_cells)
dual_tail(alpha, beta, freq_results, numb_cells)

Arguments

`alpha`	Matrix recording which alpha chains appear in each well of the data. See `create_data`.
`beta`	Matrix recording which beta chains appear in the each well of the data. See `create_data`.
`freq_results`	Output of `freq_estimate`.
`numb_cells`	Vector containing the number of cells sampled in the wells of each column of the plates.

Value

A n x 3 matrix where n is the number of candidate clones, column 1 is the beta index of the clone, and column 2-3 are the alpha indices of the clone

Discriminate between beta-sharing clones and dual-alpha TCR clones (optimized for common clones)

Description

dual_top() distinguishes between clones that share a common beta chain and dual TCR clones with two productive alpha chains. The procedure calculates the likelihood that two (alpha, beta) pairs (with common a beta chain) come from two distinct clones sharing the same beta chain vs the likelihood that the two pairs derive from a dual TCR-alpha clone. A significant difference between the two likelihoods is indicative of a dual alpha clone, and these clones are returned as dual clones.

Usage

dual_top(alpha, beta, pair, error, numb_cells)
dual_top(alpha, beta, pair, error, numb_cells)

Arguments

`alpha`	Matrix recording which alpha chains appear in each well of the data. See `create_data`.
`beta`	Matrix recording which beta chains appear in the each well of the data. See `create_data`.
`pair`	A matrix where each row is a beta/alpha pair, column 1 and 2 are the beta indices, and column 3 and 4 are the alpha indices, and column 5 is the proportion of replicates the clone was found in (or equal to -1 if the clone is dual)
`error`	The mean error "dropped" chain rate due to PCR or sequencing errors.
`numb_cells`	The number of cells per well in each column of the plates. Should be a vector of 12 elements.

Value

A matrix of dual-alpha clones, where col 1 and 2 are beta indices of the clone (which should be equal) and col 3 and 4 are alpha indices of the clone (which are different).

Estimation of frequencies of clones identified by `alphabetr`

Description

freq_estimate() estimates the frequencies of clones with confidence intervals by using a maximum likelihood approach. The function looks at the wells that a chains of a clone appear in and determines the most likely frequency that explains the data.

Usage

freq_estimate(alpha, beta, pair, error = 0.15, numb_cells)
freq_estimate(alpha, beta, pair, error = 0.15, numb_cells)

Arguments

`alpha`	Matrix recording which alpha chains appear in each well of the data. See `create_data`.
`beta`	Matrix recording which beta chains appear in the each well of the data. See `create_data`.
`pair`	A matrix where each row is a beta/alpha pair, column 1 and 2 are the beta indices, and column 3 and 4 are the alpha indices, and column 5 is the proportion of replicates the clone was found in (or equal to -1 if the clone is dual)
`error`	The mean error "dropped" chain rate due to PCR or sequencing errors.
`numb_cells`	The number of cells per well in each column of the plates. Should be a vector of 12 elements.

Value

A data frame with frequency estimates and confidence intervals

Examples

 ## Not run: 
 # obtained from the output of bagpipe()
 pairs <- pairs[pairs[, 5] > 0.3, ]
 freq  <- freq_estimate(alpha = dat$alpha,
                        beta = dat$beta,
                        pair = pairs,
                        numb_cells = matrix(c(50, 480), ncol = 2))
 
## End(Not run)
## Not run: 
 # obtained from the output of bagpipe()
 pairs <- pairs[pairs[, 5] > 0.3, ]
 freq  <- freq_estimate(alpha = dat$alpha,
                        beta = dat$beta,
                        pair = pairs,
                        numb_cells = matrix(c(50, 480), ncol = 2))
 
## End(Not run)

Calculate the precision, CV, and accuracy of frequency estimates

Description

freq_eval() will evaluated how well freq_estimate performed by calculating the precision and CV of the frequency estimates for the top clones and by determining the proportion of the top clones whose true clonal frequency lies in the 95-percent CI determined by freq_estimate

Usage

freq_eval(freq, number_skewed, TCR, numb_clones, prop_top)
freq_eval(freq, number_skewed, TCR, numb_clones, prop_top)

Arguments

`freq`	The output of `freq_estimate`
`number_skewed`	The number of clones represent the top proportion of the T cell population by frequency (this is the same `number_skewed` argument used when `create_clones` is called)
`TCR`	The clonal structure of the simulated T cell population. This is obtained by subsetting the `TCR` element of the output of `create_clones`
`numb_clones`	Total number of distinct clones in the parent population
`prop_top`	The proportion of the population in frequency represented by the number of clones specified by `skewed`.

Value

A list with the precision, cv, and accuracy of the frequency estimation.

Calculate likelihood curve of frequency estimates for a dual-alpha or dual-beta TCR clone

Description

Calculate likelihood curve of frequency estimates for a dual-alpha or dual-beta TCR clone

Usage

likelihood_dual(est, err, numb_wells, numb_cells, numb_sample)
likelihood_dual(est, err, numb_wells, numb_cells, numb_sample)

Arguments

`est`	Clonal frequency estimate
`err`	Mean drop error rate
`numb_wells`	A vector with the number of wells with the distinct sample sizes
`numb_cells`	A vector of the distinct sample sizes, i.e. the number of cells per well
`numb_sample`	A vector with the number of wells of the sample size of the same position of `numb_cells` that contains both alpha and the beta chain of the clone (for dual-alpha clones) or both beta and the alpha chain of the clone (for dual-beta clones)

Value

A numeric with the negative log likelihood

Calculate likelihood curve of frequency estimates for a dual-alpha and dual-beta TCR clone

Description

Calculate likelihood curve of frequency estimates for a dual-alpha and dual-beta TCR clone

Usage

likelihood_dualdual(est, err, numb_wells, numb_cells, numb_sample)
likelihood_dualdual(est, err, numb_wells, numb_cells, numb_sample)

Arguments

`est`	Clonal frequency estimate
`err`	Mean drop error rate
`numb_wells`	A vector with the number of wells with the distinct sample sizes
`numb_cells`	A vector of the distinct sample sizes, i.e. the number of cells per well
`numb_sample`	A vector with the number of wells of the sample size of the same position of `numb_cells` that contains both alpha and both betachains the clone

Value

A numeric with the negative log likelihood

Calculate likelihood curve of frequency estimates for a single TCR clone

Description

Calculate likelihood curve of frequency estimates for a single TCR clone

Usage

likelihood_single(est, err, numb_wells, numb_cells, numb_sample)
likelihood_single(est, err, numb_wells, numb_cells, numb_sample)

Arguments

`est`	Clonal frequency estimate
`err`	Mean drop error rate
`numb_wells`	A vector with the number of wells with the distinct sample sizes
`numb_cells`	A vector of the distinct sample sizes, i.e. the number of cells per well
`numb_sample`	A vector with the number of wells of the sample size of the same position of `numb_cells` that contains the alpha and beta chains the clone

Value

A numeric with the negative log likelihood

Read in alphabetr sequencing data into the binary matrix form needed by bagpipe()

Description

read_alphabetr() will read in two different forms of a csv file to convert sequencing data using the alphabetr approach into the binary matrices required by bagpipe. The csv file(s) can have one of two forms. (1) A single csv file with three columns: column 1 containing whether the sequence is "TCRA" or "TCRB"; column 2 containing the well number; and column 3 containing the CDR3 sequence (2) Two CSV files, one for TCRA and one for TCRB, with two columns: column 1 containing the well number and column 2 containing the CDR3 sequence

Usage

read_alphabetr(data = NULL, data_alpha = NULL, data_beta = NULL)
read_alphabetr(data = NULL, data_alpha = NULL, data_beta = NULL)

Arguments

`data`	To read in a 3-column csv file containing both TCRA and TCRB sequencing information
`data_alpha`	To read in a 2-column csv file containing TCRA sequencing information. Must be used in conjunction with the `data_beta` argument and cannot be used with the `data` argument.
`data_beta`	To read in a 2-column csv file containing TCRB sequencing information. Must be used in conjunction with the `data_alpha` argument and cannot eb used with the `data` argument.

Value

A list of two binary matrices that represent the sequencing data and two character vectors that give the CDR3 sequences associated with each chain index.

Examples

## Not run: 
  dat <- read_alphabetr(data = "alphabetr_data.csv")

  # saving the alpha and beta binary matrices
  data_alpha <- dat$alpha
  data_beta  <- dat$beta

  # finding the cdr3 sequences of alpha_2 and beta_4 respectively
  cdr3_alpha2 <- dat$alpha_lib[2]
  cdr3_beta4  <- dat$beta_lib[4]

## End(Not run)
## Not run: 
  dat <- read_alphabetr(data = "alphabetr_data.csv")

  # saving the alpha and beta binary matrices
  data_alpha <- dat$alpha
  data_beta  <- dat$beta

  # finding the cdr3 sequences of alpha_2 and beta_4 respectively
  cdr3_alpha2 <- dat$alpha_lib[2]
  cdr3_beta4  <- dat$beta_lib[4]

## End(Not run)

Package 'alphabetr'

Help Index

Identify candidate alpha/beta pairs.

Description

Usage

Arguments

Value

Examples

Calculate association scores between alpha and beta chain pairs.

Description

Usage

Arguments

Value

Examples

Combines the frequency estimation results from single TCR clones and dual TCR clones

Description

Usage

Arguments

Value

Create a synthetic set of clones with a specific underlying clonal structure

Description

Usage

Arguments

Value

Examples

Simulate sequencing data obtained from the alphabetr approach with a specified clonal structure

Description

Usage

Arguments

Value

Examples

Simulate sequencing data obtained single-cell sequencing

Description

Usage

Arguments

Value

Examples

Calculate likelihood of two beta-sharing candidate alpha-beta pairs deriving from a dual clone

Description

Usage

Arguments

Value

Calculate likelihood of two beta-sharing candidate alpha-beta pairs deriving from a dual clone

Description

Usage

Arguments

Value

Calculate dual depths and false dual rates for simulated alphabetr experiments

Description

Usage

Arguments

Value

Discriminate between beta-sharing clones and dual-alpha TCR clones (optimized for rare clones)

Description

Usage

Arguments

Value

Discriminate between beta-sharing clones and dual-alpha TCR clones (optimized for common clones)

Description

Usage

Arguments

Value

Estimation of frequencies of clones identified by alphabetr

Description

Usage

Arguments

Value

Examples

Calculate the precision, CV, and accuracy of frequency estimates

Description

Usage

Arguments

Value

Calculate likelihood curve of frequency estimates for a dual-alpha or dual-beta TCR clone

Description

Usage

Arguments

Value

Calculate likelihood curve of frequency estimates for a dual-alpha and dual-beta TCR clone

Description

Estimation of frequencies of clones identified by `alphabetr`