Title: | Algorithms for High-Throughput Sequencing of Antigen-Specific T Cells |
---|---|
Description: | Provides algorithms for frequency-based pairing of alpha-beta T cell receptors. |
Authors: | Edward Lee [aut, cre] |
Maintainer: | Edward Lee <[email protected]> |
License: | AGPL (>= 3) |
Version: | 0.2.2 |
Built: | 2025-03-06 04:04:09 UTC |
Source: | https://github.com/edwardslee/alphabetr |
bagpipe()
uses the alphabetr resampling procedure on sequencing data
to identify candidate alpha/beta pairs. The procedure takes a subsample of
the data without replacement, calculates association scores with
chain_scores
, and then for each well, uses the Hungarian
algorithm to determine the most likely pairings for the chains found in the
well. Each time this is done is a replicate, and the number of replicates is
specified as an option. A threshold is then used to filter the candidate
pairs that appear in proportion of the replicates larger than the threshold,
resulting in the final list of candidate pairs. Bagpipe is an acronym for
bootstrapping alphabetr generated
pairs procedure (based on older versions
that utilized bootstrapping)
bagpipe(alpha, beta, replicates = 100, frac = 0.75, bootstrap = FALSE)
bagpipe(alpha, beta, replicates = 100, frac = 0.75, bootstrap = FALSE)
alpha |
Matrix recording which alpha chains appear in each well of the
data. See |
beta |
Matrix recording which beta chains appear in the each well of the
data. See |
replicates |
The number of times the resampling procedure is repeated, i.e. the number of replicates. At least 100 replicates is recommended. |
frac |
The fraction of the wells resampled in each replicate. Default is 75% of the wells |
bootstrap |
Legacy option. Calls a bootstrapping strategy (which resamples with replacement) instead of sampling a subset without replacement. |
A n x 5 matrix where n is the number of clones determined by
bagpipe()
. Each row represents the chains of the clone.
Columns 1 and 2 represent the beta chain indices of the clone.
Columns 3 and 4 represent the alpha chain indices of the clone.
Column 5 represents the number of replicates that this clone was found in.
Column 5 is used to filter our the clones that have not be determined by
a certain "threshold" proportion of replicates.
Note that column 1 == column 2 and column 3 == column 4 since
bagpipe()
does not try to determine dual-alpha or dual-beta clones.
# see the help for create_clones() and create_data() clones <- create_clones(numb_beta = 1000, dual_alpha = .3, dual_beta = .06, alpha_sharing = c(0.80, 0.15, 0.05), beta_sharing = c(0.75, 0.20, 0.05)) dat <- create_data(clones$TCR, plate = 5, error_drop = c(.15, .01), error_seq = c(.05, .001), error_mode = c("lognormal", "lognormal"), skewed = 10, prop_top = 0.6, dist = "linear", numb_cells = matrix(c(50, 480), ncol = 2)) ## Not run: # normally want to set replicates to 100 or more pairs <- bagpipe(alpha = dat$alpha, beta = dat$beta, replicates = 5, frac = 0.75, bootstrap = FALSE) # using a threshold of 0.3 of replicates pairs <- pairs[pairs[, 5] > 0.3, ] ## End(Not run)
# see the help for create_clones() and create_data() clones <- create_clones(numb_beta = 1000, dual_alpha = .3, dual_beta = .06, alpha_sharing = c(0.80, 0.15, 0.05), beta_sharing = c(0.75, 0.20, 0.05)) dat <- create_data(clones$TCR, plate = 5, error_drop = c(.15, .01), error_seq = c(.05, .001), error_mode = c("lognormal", "lognormal"), skewed = 10, prop_top = 0.6, dist = "linear", numb_cells = matrix(c(50, 480), ncol = 2)) ## Not run: # normally want to set replicates to 100 or more pairs <- bagpipe(alpha = dat$alpha, beta = dat$beta, replicates = 5, frac = 0.75, bootstrap = FALSE) # using a threshold of 0.3 of replicates pairs <- pairs[pairs[, 5] > 0.3, ] ## End(Not run)
chain_scores()
calculates association scores between every pair of
alpha and beta chains based on the number of concurrent well appearances
each alpha and beta pair makes, scaled inversely by the number of unique
chains in that well. See Lee et. al. for more information about this
procedure.
chain_scores(data_a, data_b)
chain_scores(data_a, data_b)
data_a |
Matrix recording which alpha chains appear in each well of the
data. See |
data_b |
Matrix recording which beta chains appear in the each well of the
data. See |
A list containing the alpha and beta association scores. Accessed
with list$ascores
and list$bscores
respectively.
# see the help for create_clones() and create_data() clones <- create_clones(numb_beta = 1000, dual_alpha = .3, dual_beta = .06, alpha_sharing = c(0.80, 0.15, 0.05), beta_sharing = c(0.75, 0.20, 0.05)) dat <- create_data(clones$TCR, plate = 5, error_drop = c(.15, .01), error_seq = c(.05, .001), error_mode = c("lognormal", "lognormal"), skewed = 10, prop_top = 0.6, dist = "linear", numb_cells = matrix(c(50, 480), ncol = 2)) #this is done internally in bagpipe() scores <- chain_scores(data_a = dat$alpha, data_b = dat$beta) scores <- scores$ascores + t(scores$bscores)
# see the help for create_clones() and create_data() clones <- create_clones(numb_beta = 1000, dual_alpha = .3, dual_beta = .06, alpha_sharing = c(0.80, 0.15, 0.05), beta_sharing = c(0.75, 0.20, 0.05)) dat <- create_data(clones$TCR, plate = 5, error_drop = c(.15, .01), error_seq = c(.05, .001), error_mode = c("lognormal", "lognormal"), skewed = 10, prop_top = 0.6, dist = "linear", numb_cells = matrix(c(50, 480), ncol = 2)) #this is done internally in bagpipe() scores <- chain_scores(data_a = dat$alpha, data_b = dat$beta) scores <- scores$ascores + t(scores$bscores)
combine_freq_results()
combines the results of the frequency estimation
performed on single TCR clones (from the output of bagpipe
) and
the frequency estimation performed on dual clones. The code will find the
rows of the single TCR frequency results that are represented by the dual
clones and replace them with the appropriate dual clone entry.
combine_freq_results(single, dual)
combine_freq_results(single, dual)
single |
Frequency estimation results of single TCR clones (usually from
the first time |
dual |
Frequency estimation results of dual TCR-alpha clones |
A data.frame with the same structure as the output of
freq_estimate
. If two single "clones" in the single
data.frame is represented by a dual clone in dual
, then it is
removed and replaced with one row represented by the dual clone.
create_clones()
creates a set of (beta1, beta2, alpha1, alpha2)
quadruples that represent the indices of the chains of clones. The function
will take a fixed number of unique beta chains that are in the T cell
population, and then use the degree of beta and alpha sharing to determine
the number of unique alpha chains in the populations. These chains will
then be randomly assigned to each other, with a proportion of them being
dual TCR clones (i.e. alpha1 != alpha2 and/or beta1 != beta2), forming
our random list of clones with their chain indices.
create_clones(numb_beta, dual_beta, dual_alpha, alpha_sharing, beta_sharing)
create_clones(numb_beta, dual_beta, dual_alpha, alpha_sharing, beta_sharing)
numb_beta |
The number of unique betas in the clonal population |
dual_beta |
The proportion of clone that are dual TCRbeta clones, i.e. has two distinct beta chains |
dual_alpha |
The proportion of clones that are dual TCRalpha clones, i.e. has two distinct alpha chains |
alpha_sharing |
A vector where the ith position represents the proportion of alpha chains that are shared by i clones; alpha chains can be shared by up to 7 clones |
beta_sharing |
A vector where the ith position represents the proportion of beta chains that are shared by i clones; beta chains can be shared by up to 5 clones |
A list of four different matrices. Each matrix is has dimensions n x 4, where n is the total number of clones and each row represents the chains of a clone. Column 1 and column 2 are the beta index/indices of the beta chain(s) used by the clone. Column 3 and 4 are the alpha index/indices of the alpha chain(s) used by the clone. If a clone has a single beta chain, then col 1 and col 2 will be equal. If a clone has a single alpha chain, then col 3 and col 4 will be equal.
# Creating a population containing 1000 beta chains; 10% of clones with # dual-beta TCRs and 30% of clones with dual TCRs; 75% beta shared by one # clone, 20% by two clones, 5% by three clones; 80% alpha chains shared by # one clone, 15% by two clones, and 5% by three clones clones <- create_clones(numb_beta = 1000, dual_alpha = .3, dual_beta = .06, alpha_sharing = c(0.80, 0.15, 0.05), beta_sharing = c(0.75, 0.20, 0.05))
# Creating a population containing 1000 beta chains; 10% of clones with # dual-beta TCRs and 30% of clones with dual TCRs; 75% beta shared by one # clone, 20% by two clones, 5% by three clones; 80% alpha chains shared by # one clone, 15% by two clones, and 5% by three clones clones <- create_clones(numb_beta = 1000, dual_alpha = .3, dual_beta = .06, alpha_sharing = c(0.80, 0.15, 0.05), beta_sharing = c(0.75, 0.20, 0.05))
create_data()
simulates an alphabetr sequencing experiment by sampling
clones from a clonal structure specified by the user. The clones are
placed on a frequency distribution where a fixed number clones represents
the top proportion of the population in frequency and the other clones
represent the rest of the population in frequency. Different error models
and different sampling strategies can be simulated as well.
create_data(TCR, plates, error_drop, error_seq, error_mode, skewed, prop_top, dist = "linear", numb_cells)
create_data(TCR, plates, error_drop, error_seq, error_mode, skewed, prop_top, dist = "linear", numb_cells)
TCR |
The specified clonal structure, which can be created from
|
plates |
The number of plates of data. |
error_drop |
A vector of length 2 with the mean of the drop error rate and the sd of the drop error rate. |
error_seq |
A vector of length 2 with the mean of the in-frame error rate and the sd of the in-frame error rate. |
error_mode |
A vector of two strings determining the "mode" of the error
models. The first element sets the mode of the drop errors, and the second
element sets the mode of the in-frame errors. The two modes available are
"constant" for a constant error rate and "lognormal" for error rates
drawn from a lognormal distribution. If the mode is set to "constant" the
sd specified in |
skewed |
Number of clones represent the top proportion of the population
by frequency (which is specified by |
prop_top |
The proportion of the population in frequency represented by
the number of clones specified by |
dist |
The distribution of frequency of the top clones. Currently only "linear" is available. |
numb_cells |
A two column matrix determining the sampling strategy of the experiment. The first column represents the number of cells per well, and the second column represents the number of wells with that sample size. The sum of column 2 should equal 96 times the number of plates. |
A list of length 2. The first element is a matrix representing the data of the alpha chains, and the second element is a matrix representing the data of beta chains. The matrix represents the sequencing data by representing the wells of the data by rows and the chain indices by column. Entry [i, j] of the matrix represents if chain j is found in well i (yes == 1, no == 0). e.g. if alpha chain 25 is found in well 10, then [10, 25] of the alpha matrix will be 1.
# see the help for create_clones() for details of this function call clones <- create_clones(numb_beta = 1000, dual_alpha = .3, dual_beta = .06, alpha_sharing = c(0.80, 0.15, 0.05), beta_sharing = c(0.75, 0.20, 0.05)) # creating a data set with 5 plates, lognormal error rates, 10 clones # making up the top 60% of the population in frequency, and a constant # sampling strategy of 50 cells per well for 480 wells (five 96-well plates) dat <- create_data(clones$TCR, plate = 5, error_drop = c(.15, .01), error_seq = c(.05, .001), error_mode = c("lognormal", "lognormal"), skewed = 10, prop_top = 0.6, dist = "linear", numb_cells = matrix(c(50, 480), ncol = 2))
# see the help for create_clones() for details of this function call clones <- create_clones(numb_beta = 1000, dual_alpha = .3, dual_beta = .06, alpha_sharing = c(0.80, 0.15, 0.05), beta_sharing = c(0.75, 0.20, 0.05)) # creating a data set with 5 plates, lognormal error rates, 10 clones # making up the top 60% of the population in frequency, and a constant # sampling strategy of 50 cells per well for 480 wells (five 96-well plates) dat <- create_data(clones$TCR, plate = 5, error_drop = c(.15, .01), error_seq = c(.05, .001), error_mode = c("lognormal", "lognormal"), skewed = 10, prop_top = 0.6, dist = "linear", numb_cells = matrix(c(50, 480), ncol = 2))
create_data_singlecells()
simulates a single-cell sequencing
experiment by sampling clones from a clonal structure specified by the user
and using the same error models and frequency distributions used in
create_data
. These functions are almost identical except this
one simulates the sampling and sequencing of single T cells.
create_data_singlecells(TCR, plates = 5, error_drop = c(0.15, 0.01), error_seq = c(0.05, 0.01), error_mode = c("constant", "constant"), skewed = 15, prop_top = 0.5, dist = "linear")
create_data_singlecells(TCR, plates = 5, error_drop = c(0.15, 0.01), error_seq = c(0.05, 0.01), error_mode = c("constant", "constant"), skewed = 15, prop_top = 0.5, dist = "linear")
TCR |
The specified clonal structure, which can be created from
|
plates |
The number of plates of data. The number of single-cells is 96
times |
error_drop |
A vector of length 2 with the mean of the drop error rate and the sd of the drop error rate. |
error_seq |
A vector of length 2 with the mean of the in-frame error rate and the sd of the in-frame error rate. |
error_mode |
A vector of two strings determining the "mode" of the error
models. The first element sets the mode of the drop errors, and the second
element sets the mode of the in-frame errors. The two modes available are
"constant" for a constant error rate and "lognormal" for error rates
drawn from a lognormal distribution. If the mode is set to "constant" the
sd specified in |
skewed |
Number of clones represent the top proportion of the population
by frequency (which is specified by |
prop_top |
The proportion of the population in frequency represented by
the number of clones specified by |
dist |
The distribution of frequency of the top clones. Currently only "linear" is available. |
A list of length 3. The first element is a matrix representing the data of the alpha chains ($alpha), and the second element is a matrix representing the data of beta chains ($beta). The matrix represents the sequencing data by representing the wells of the data by rows and the chain indices by column. Entry [i, j] of the matrix represents if chain j is found in well i (yes == 1, no == 0). e.g. if alpha chain 25 is found in well 10, then [10, 25] of the alpha matrix will be 1.
The third element of the list ($drop) is a matrix that records the index of the clone sampled in the well (col 1), records if a drop error occurred (col 2), and record if an in-frame error occurred (col 3).
# see the help for create_clones() for details of this function call clones <- create_clones(numb_beta = 1000, dual_alpha = .3, dual_beta = .06, alpha_sharing = c(0.80, 0.15, 0.05), beta_sharing = c(0.75, 0.20, 0.05)) # creating a data set with 480 single cells, lognormal error rates, 10 clones # making up the top 60% of the population in frequency, and a constant # sampling strategy of 50 cells per well for 480 wells (five 96-well plates) dat <- create_data_singlecells(clones$TCR, plate = 5, error_drop = c(.15, .01), error_seq = c(.05, .001), error_mode = c("lognormal", "lognormal"), skewed = 10, prop_top = 0.6, dist = "linear")
# see the help for create_clones() for details of this function call clones <- create_clones(numb_beta = 1000, dual_alpha = .3, dual_beta = .06, alpha_sharing = c(0.80, 0.15, 0.05), beta_sharing = c(0.75, 0.20, 0.05)) # creating a data set with 480 single cells, lognormal error rates, 10 clones # making up the top 60% of the population in frequency, and a constant # sampling strategy of 50 cells per well for 480 wells (five 96-well plates) dat <- create_data_singlecells(clones$TCR, plate = 5, error_drop = c(.15, .01), error_seq = c(.05, .001), error_mode = c("lognormal", "lognormal"), skewed = 10, prop_top = 0.6, dist = "linear")
dual_discrim_dual_likelihood()
is used within dual_top
to calculate the likelihood that two alpha-beta pairs identified by
bagpipe
sharing the same beta chain derive from a single
dual-alpha clone (instead of two distinct clones sharing the same beta).
dual_discrim_dual_likelihood(est, err, numb_cells, numb_wells, binomials)
dual_discrim_dual_likelihood(est, err, numb_cells, numb_wells, binomials)
est |
Frequency estimate of the putative dual-alpha clone |
err |
Mean drop error rate |
numb_cells |
Vector containing the number of cells per well |
numb_wells |
Vector containing the number of wells with the sample
sizes given by |
binomials |
Calculations of the needed binomial coefficients; this is faster in R than in Rcpp (from my own tests) |
A numeric containing the negative log likelihood
dual_eval()
is used in simulation situations to compare the duals
determined by dual_top
and dual_tail
(which can
be combined with rbind()
) to the duals in the simulated T cell
population.
dual_eval(duals, pair, TCR, number_skewed, TCR_dual)
dual_eval(duals, pair, TCR, number_skewed, TCR_dual)
duals |
A 4 column matrix (col 1 + 2 = beta indices, col 3 + 4 = alpha
indices) containing the indices of dual-alpha clones. The output of
|
pair |
The output of |
TCR |
The clonal structure of the simulated T cell population. This is
obtained by subsetting the |
number_skewed |
The number of clones represent the top proportion of
the T cell population by frequency (this is the same |
TCR_dual |
The dual clones of the simulated T cell population. This is
obtained by subsetting the |
A data.frame with the following columns:
fdr
, the false dual rate
numb_cand_duals
, the number of duals identified
adj_depth_top
, the adjusted dual depth of top clones
abs_depth_top
, the absolute dual depth of top clones
numb_correct_top
, the number of correctly identified dual
clones in the top
numb_duals_ans_top
, the number of top dual clones in the
simulated T cell population
numb_poss_top
, the number of top dual clones whose beta and
both alpha chains were identified by bagpipe()
numb_unestimated_top
, number of top dual clones whose
frequencies could not be calculated (usually because the clones
appeared in every well of the data)
adj_depth_tail
, the adjusted dual depth of tail clones
abs_depth_tail
, the absolute dual depth of tail clones
numb_correct_tail
, the number of correctly identified tail
clones
numb_duals_ans_tail
, the number of dual tail clones in the
simulated T cell population
numb_poss_tail
, the number of tail dual cloens whose beta
and both alpha chains were identified by bagpipe()
numb_unestimated_tail
, the number of tail clones whose
frequencies could not be calculated
dual_tail()
distinguishes between clones that share a common beta
chain and dual TCR clones with two productive alpha chains. The procedure
tests the null hypothesis that two candidate alpha, beta pairs with the
same beta represent two separate clones by using the frequency estimates
to calculate the number of wells that both clones are expected to be in.
This is compared to the actual number of wells that both clones appear in,
and if the actual number is greater than the expected number, than the
pairs are chosen to represent a dual TCR clone.
dual_tail(alpha, beta, freq_results, numb_cells)
dual_tail(alpha, beta, freq_results, numb_cells)
alpha |
Matrix recording which alpha chains appear in each well of the
data. See |
beta |
Matrix recording which beta chains appear in the each well of the
data. See |
freq_results |
Output of |
numb_cells |
Vector containing the number of cells sampled in the wells of each column of the plates. |
A n x 3 matrix where n is the number of candidate clones, column 1 is the beta index of the clone, and column 2-3 are the alpha indices of the clone
dual_top()
distinguishes between clones that share a common beta
chain and dual TCR clones with two productive alpha chains. The procedure
calculates the likelihood that two (alpha, beta) pairs (with common a beta
chain) come from two distinct clones sharing the same beta chain vs the
likelihood that the two pairs derive from a dual TCR-alpha clone.
A significant difference between the two likelihoods is indicative of a
dual alpha clone, and these clones are returned as dual clones.
dual_top(alpha, beta, pair, error, numb_cells)
dual_top(alpha, beta, pair, error, numb_cells)
alpha |
Matrix recording which alpha chains appear in each well of the
data. See |
beta |
Matrix recording which beta chains appear in the each well of the
data. See |
pair |
A matrix where each row is a beta/alpha pair, column 1 and 2 are the beta indices, and column 3 and 4 are the alpha indices, and column 5 is the proportion of replicates the clone was found in (or equal to -1 if the clone is dual) |
error |
The mean error "dropped" chain rate due to PCR or sequencing errors. |
numb_cells |
The number of cells per well in each column of the plates. Should be a vector of 12 elements. |
A matrix of dual-alpha clones, where col 1 and 2 are beta indices of the clone (which should be equal) and col 3 and 4 are alpha indices of the clone (which are different).
alphabetr
freq_estimate()
estimates the frequencies of clones with confidence
intervals by using a maximum likelihood approach. The function looks at
the wells that a chains of a clone appear in and determines the most
likely frequency that explains the data.
freq_estimate(alpha, beta, pair, error = 0.15, numb_cells)
freq_estimate(alpha, beta, pair, error = 0.15, numb_cells)
alpha |
Matrix recording which alpha chains appear in each well of the
data. See |
beta |
Matrix recording which beta chains appear in the each well of the
data. See |
pair |
A matrix where each row is a beta/alpha pair, column 1 and 2 are the beta indices, and column 3 and 4 are the alpha indices, and column 5 is the proportion of replicates the clone was found in (or equal to -1 if the clone is dual) |
error |
The mean error "dropped" chain rate due to PCR or sequencing errors. |
numb_cells |
The number of cells per well in each column of the plates. Should be a vector of 12 elements. |
A data frame with frequency estimates and confidence intervals
## Not run: # obtained from the output of bagpipe() pairs <- pairs[pairs[, 5] > 0.3, ] freq <- freq_estimate(alpha = dat$alpha, beta = dat$beta, pair = pairs, numb_cells = matrix(c(50, 480), ncol = 2)) ## End(Not run)
## Not run: # obtained from the output of bagpipe() pairs <- pairs[pairs[, 5] > 0.3, ] freq <- freq_estimate(alpha = dat$alpha, beta = dat$beta, pair = pairs, numb_cells = matrix(c(50, 480), ncol = 2)) ## End(Not run)
freq_eval()
will evaluated how well freq_estimate
performed
by calculating the precision and CV of the frequency estimates for the top
clones and by determining the proportion of the top clones whose true clonal
frequency lies in the 95-percent CI determined by freq_estimate
freq_eval(freq, number_skewed, TCR, numb_clones, prop_top)
freq_eval(freq, number_skewed, TCR, numb_clones, prop_top)
freq |
The output of |
number_skewed |
The number of clones represent the top proportion of
the T cell population by frequency (this is the same |
TCR |
The clonal structure of the simulated T cell population. This is
obtained by subsetting the |
numb_clones |
Total number of distinct clones in the parent population |
prop_top |
The proportion of the population in frequency represented by
the number of clones specified by |
A list with the precision, cv, and accuracy of the frequency estimation.
Calculate likelihood curve of frequency estimates for a dual-alpha or dual-beta TCR clone
likelihood_dual(est, err, numb_wells, numb_cells, numb_sample)
likelihood_dual(est, err, numb_wells, numb_cells, numb_sample)
est |
Clonal frequency estimate |
err |
Mean drop error rate |
numb_wells |
A vector with the number of wells with the distinct sample sizes |
numb_cells |
A vector of the distinct sample sizes, i.e. the number of cells per well |
numb_sample |
A vector with the number of wells of the sample size of
the same position of |
A numeric with the negative log likelihood
Calculate likelihood curve of frequency estimates for a dual-alpha and dual-beta TCR clone
likelihood_dualdual(est, err, numb_wells, numb_cells, numb_sample)
likelihood_dualdual(est, err, numb_wells, numb_cells, numb_sample)
est |
Clonal frequency estimate |
err |
Mean drop error rate |
numb_wells |
A vector with the number of wells with the distinct sample sizes |
numb_cells |
A vector of the distinct sample sizes, i.e. the number of cells per well |
numb_sample |
A vector with the number of wells of the sample size of
the same position of |
A numeric with the negative log likelihood
Calculate likelihood curve of frequency estimates for a single TCR clone
likelihood_single(est, err, numb_wells, numb_cells, numb_sample)
likelihood_single(est, err, numb_wells, numb_cells, numb_sample)
est |
Clonal frequency estimate |
err |
Mean drop error rate |
numb_wells |
A vector with the number of wells with the distinct sample sizes |
numb_cells |
A vector of the distinct sample sizes, i.e. the number of cells per well |
numb_sample |
A vector with the number of wells of the sample size of
the same position of |
A numeric with the negative log likelihood
read_alphabetr()
will read in two different forms of a csv file to
convert sequencing data using the alphabetr approach into the binary
matrices required by bagpipe
. The csv file(s) can have one of
two forms. (1) A single csv file with three columns: column 1 containing
whether the sequence is "TCRA" or "TCRB"; column 2 containing the well
number; and column 3 containing the CDR3 sequence
(2) Two CSV files, one for TCRA and one for TCRB, with two columns: column
1 containing the well number and column 2 containing the CDR3 sequence
read_alphabetr(data = NULL, data_alpha = NULL, data_beta = NULL)
read_alphabetr(data = NULL, data_alpha = NULL, data_beta = NULL)
data |
To read in a 3-column csv file containing both TCRA and TCRB sequencing information |
data_alpha |
To read in a 2-column csv file containing TCRA sequencing
information. Must be used in conjunction with the |
data_beta |
To read in a 2-column csv file containing TCRB sequencing
information. Must be used in conjunction with the |
A list of two binary matrices that represent the sequencing data and two character vectors that give the CDR3 sequences associated with each chain index.
## Not run: dat <- read_alphabetr(data = "alphabetr_data.csv") # saving the alpha and beta binary matrices data_alpha <- dat$alpha data_beta <- dat$beta # finding the cdr3 sequences of alpha_2 and beta_4 respectively cdr3_alpha2 <- dat$alpha_lib[2] cdr3_beta4 <- dat$beta_lib[4] ## End(Not run)
## Not run: dat <- read_alphabetr(data = "alphabetr_data.csv") # saving the alpha and beta binary matrices data_alpha <- dat$alpha data_beta <- dat$beta # finding the cdr3 sequences of alpha_2 and beta_4 respectively cdr3_alpha2 <- dat$alpha_lib[2] cdr3_beta4 <- dat$beta_lib[4] ## End(Not run)