Monday, September 26, 2022
HomeMicrobiologyDrivers and determinants of pressure dynamics following fecal microbiota transplantation

Drivers and determinants of pressure dynamics following fecal microbiota transplantation


Knowledge overview

The research dataset comprised 22 unbiased cohorts recruited in facilities in america, the Netherlands and Australia, with a complete of 316 FMTs carried out in 311 sufferers affected by rCDI (n = 62 FMTs26,27,28,32,36), an infection with ESBL (n = 59 (refs. 37,38,39)), MetS (n = 50 (refs. 18,25,40)), UC (n = 42 (refs. 29,41,42,43)), anti-PD1 remedy resistance in sufferers with melanoma (n = 37 (refs. 9,10)), IBS (n = 30 (ref. 44)), Crohn’s illness (n = 18 (ref. 45)), chemotherapy-induced diarrhea in sufferers with renal carcinoma (n = 10 (ref. 46)), Tourette’s syndrome (n = 5 (ref. 47) and in wholesome volunteers (n = 3 (ref. 48)). On common, 4.11 recipient stool samples had been obtainable per FMT time collection, together with baseline samples taken earlier than the intervention (pre-FMT). Total, 7.9 Terabases (Tb) of sequencing information had been analyzed throughout 1,492 fecal metagenomes, of which 269 (for 76 time collection) had been generated as a part of the current research (for cohorts UC_NL, ESBL_NL, MetS_NL_1 and div_AU).

Three cohorts (UC_NL, MetS_NL_1 and MetS_NL_Koopen) had been randomized managed trials throughout which a subset of sufferers acquired autologous FMTs (transplantation of the recipient’s personal stool, n = 33 FMTs). All different FMTs (n = 283) had been allogenic, utilizing stool donors. For 228 FMT time collection, a full complement of donor baseline, recipient baseline and not less than one recipient post-FMT pattern had been obtainable after filtering.

A full description of all cohorts is supplied in Supplementary Desk 1, detailed info per FMT time collection in Supplementary Desk 2 and per-sample info in Supplementary Desk 3.

Pattern assortment, processing and metagenomic sequencing

Research design and fecal pattern assortment for cohorts MetS_NL_1 (refs. 18,25), UC_NL41,61 and ESBL_NL37 had been described beforehand. rCDI_AU and UC_AU samples had been obtained from a single-center, proof-of-concept, parallel and managed research in collaboration with the Centre for Digestive Ailments (Sydney, Australia), which aimed to evaluate donor microbiota implantation in two sufferers with CDI and three with UC as much as 28 days following a 2-day fecal microbiota transplantation infusion by way of transcolonoscopy and rectal enema. The research is registered with the Australian New Zealand Scientific Trials Registry below ACTRN12614000503628 (Common Trial no, U1111-1156-5909). Written, knowledgeable participant consent and moral approval had been obtained by way of the Centre for Digestive Ailments Human Analysis Ethics Committee. Deidentified participant information related to the research are supplied in Supplementary Tables 2 and three.

For cohorts MetS_NL_1 and UC_NL, fecal DNA extraction was described within the authentic research. DNA from ESBL_NL samples was extracted utilizing the GNOME DNA Isolation Equipment (MP Biomedicals) with the next minor modifications: cell lysis/denaturation was carried out (30 min, 55 °C) earlier than protease digestion was carried out in a single day (55 °C), and RNAse digestion (50 μl, 30 min, 55 °C) was carried out after mechanical lysis. After closing precipitation, DNA was resuspended in TE buffer and saved at −20 °C for additional evaluation.

Metagenomic sequencing libraries for MetS_NL_1, UC_NL, ESBL_NL and div_AU samples had been ready to a goal insert measurement of 350–400 base pairs (bp) on a Biomek FXp Twin Hybrid with high-density structure adapters, orbital shaker, static peltier and shaking peltier (Beckman Coulter) and a robotic PCR cycler (Biometra), utilizing SPRIworks HT kits (Beckman Coulter) in accordance with the provider’s suggestion, with the next modifications: 500 ng of DNA initially, adapter dilution 1:25, package chemical dilution 1:1 in course of. For samples with low-input DNA concentrations, libraries had been as a substitute ready manually utilizing NEBNext Extremely II DNA Library Prep kits with NEBNext Singleplex primers. Libraries had been sequenced on an Illumina HiSeq 4000 platform with 2 × 150-bp paired-end reads.

Public datasets

Primarily based on a literature search, 18 datasets on FMT cohorts that met the next standards had been included within the research: (1) public availability of metagenomic sequencing information in January 2022; (2) adequate obtainable description to unambiguously match donors and recipients per FMT time collection; and (3) no restrictions on information reuse. They had been included on this research as RCDI_US_Smillie (n = 22 FMT time collection26), RCDI_US_Aggarwala (n = 14 (ref. 28)), RCDI_US_Watson (n = 10 (ref. 32)), RCDI_US_Podlesny (n = 8 (ref. 27)), RCDI_US_Moss (n = 6 (ref. 36)), MetS_NL_Koopen (n = 24 (ref. 40)), UC_US_Damman (n = 6 (ref.43)), UC_US_Nusbaum (n = 4 (ref. 42)), UC_US_Lee (n = 2 (ref. 29)), CD_US_Vaughn (n = 18 (ref. 45)), ABXR_div_Leo (n = 26 (ref. 39)), ABXR_IS_BarYoseph (n = 14 (ref. 38)), IBS_NO_Goll (n = 30 (ref. 44)), MEL_US_Davar (n = 27 (ref. 10)), MEL_US_Baruch (n = 109), REN_IT_Ianiro (n = 10 (ref. 46)), TOU_CN_Zhao (n = 5 (ref. 47)) and CTR_RU_Goloshchapov (n = 3 (ref. 48)). Contextual information, together with donor–recipient matchings and details about medical response, had been curated from the research publications and, in some circumstances, kindly amended by the research’ authentic authors on request (Supplementary Tables 1–3).

Metagenomic information processing and taxonomic and practical profiling

Metagenomic reads had been high quality trimmed to take away base calls with a Phred rating of <25. Reads had been then discarded in the event that they had been <45 nucleotides or in the event that they mapped to the human genome (GRCh38.p10) with not less than 90% identification over 45 nucleotides. This processing was carried out utilizing NGLess62. Taxonomic profiles per pattern had been obtained utilizing mOTUs v.2 (ref. 63). For practical profiling, reads had been mapped in opposition to the World Microbial Gene Catalog v.1 intestine subcatalogue (gmgc.embl.de64) with a minimal match size of 45 nucleotides with not less than 97% identification, and summarized primarily based on antimicrobial resistance gene (ARG) annotations and Kyoto Encyclopedia of Genes and Genomes orthologs (KOs) by way of eggNOG annotations65. Primarily based on the ensuing KO profiles, GMMs66 had been quantified in every pattern utilizing omixer-rpmR (v.0.3.2)67. Taxonomic and GMM profiles per pattern, normalized by learn depth, can be found in Supplementary Tables 7 and eight.


We demarcated MAGs from samples of research MetS_NL_1, UC_NL, ABXR_NL, div_AU, RCDI_US_Smillie, RCDI_US_Moss, UC_US_Damman, UC_US_Nusbaum, UC_US_Lee and CD_US_Vaughn utilizing a number of complementary methods to acquire each excessive decision from sample-specific assemblies and deep protection of lowly plentiful species from coassemblies of a number of samples. Until in any other case indicated, all instruments within the following had been run with default parameters.

To generate single-sample MAGs, fecal metagenomes had been assembled individually utilizing metaSPAdes v.3.12.0 (ref. 68), reads had been mapped again to contigs utilizing bwa-mem v.0.7.17 (ref. 69) and contigs had been binned utilizing metaBAT v.2.12.1 (ref. 70). Multisample MAGs had been constructed for every cohort individually. Reads had been first coassembled utilizing megahit v.1.1.3 (ref. 71) and mapped again to contigs utilizing bwa-mem v.0.7.17. Coassembled contigs had been then binned utilizing each CONCOCT v.0.5.0 (ref. 72) and metaBAT v.2.12.1. The ensuing coassembled MAG units had been additional refined utilizing DAS TOOL73 and metaWRAP74. In complete, 47,548 MAGs had been demarcated utilizing these 5 approaches (single-sample MAGs, multisample coassembled CONCOCT, metaBAT2, DAS TOOL and metaWRAP MAGs). As well as, we included 25,037 high-quality reference genomes from the proGenomes database75,76 in downstream analyses.

Genome high quality was estimated utilizing CheckM77 and GUNC v.0.1 (ref. 78), and all genomes had been taxonomically categorized utilizing GTDB-tk79. Open studying frames (ORFs) had been predicted utilizing prodigal80 and annotated by way of prokka workflow v.1.14.6 (ref. 81). Orthologs to identified gene households had been detected utilizing eggNOG-mapper v.1 (ref. 82). ARGs had been annotated utilizing a workflow combining info from databases CARD v.3.0.0 (by way of rgi v.4.2.4 (ref. 83) and ResFams v.1.2.2 (ref. 84), as described beforehand76. The ‘specI’ set of 40 near-universal single-copy marker genes had been detected in every genome utilizing fetchMG85.

The complete set of generated MAGs and contextual information can be found by way of Zenodo (DOI 10.5281/zenodo.5534163 (ref. 86)).

Genome clustering, species metapangenomes and phylogeny

Genomes had been clustered into species-level teams utilizing an ‘open-reference’ method in a number of steps. Preliminary prefiltering utilizing lenient high quality standards (CheckM-estimated completeness ≥70%, contamination ≤25%; extra standards had been utilized downstream) eliminated 57.7% of MAGs. The remaining 20,093 MAGs had been mapped to the clustered proGenomes v.1 (ref. 75) and mOTUs v.2 (ref. 63) taxonomic marker gene databases utilizing MAPseq v.1.2.3 (ref. 87). A complete of 17,720 MAGs had been confidently assigned to a ref-mOTU (specI cluster) or meta-mOTU primarily based on the next standards: (1) detection of not less than 20% of the screened taxonomic marker genes and (2) a majority of markers assigning to the identical mOTU at a conservative MAPseq confidence threshold of ≥0.9.

In an unbiased method, quality-filtered MAGs and reference genomes had been additionally clustered by common nucleotide identification (ANI) utilizing a modified and scalable reimplementation of the dRep workflow88. Utilizing pairwise distances computed with mash v.2.1 (ref. 89), sequences had been first preclustered to 90% mash-ANI utilizing the single-linkage algorithm, asserting that each one genome pairs sharing ≥90% mash-ANI had been grouped collectively. Every mash precluster was then resolved to 95 and 99% common linkage ANI clusters utilizing fastANI v.1.1 (ref. 90). For every cluster, a consultant genome was picked as both the corresponding reference specI cluster consultant within the proGenomes database or the MAG with the very best dRep rating (calculated primarily based on estimated completeness and contamination). Genome partitions primarily based on 95% common linkage ANI clustering and specI marker gene mappings matched virtually completely, at an adjusted Rand index of >0.99. We due to this fact outlined a complete of 1,089 species-level clusters (‘species’) from our dataset (Supplementary Desk 4), based on marker gene mappings to precomputed ref-mOTUs (or specI clusters, n = 295) and meta-mOTUs (n = 528), and as 95% common linkage ANI clusters for genomes that didn’t map to both of those databases (n = 233).

Species pangenomes had been generated by clustering all genes inside every species-level cluster at 95% amino acid identification, utilizing Roary 3.12.0 (ref. 91). Spurious and putatively contaminant gene clusters (as launched by misbinned contigs in MAGs) had been eliminated by asserting that the underlying gene sequences originated (1) from a reference genome within the proGenomes database or (2) from not less than two unbiased MAGs, assembled from distinct samples or research. To account for incomplete genomes, ‘prolonged core genes’ had been outlined as gene clusters current in >80% of genomes in a species-level cluster. If too few gene clusters glad this criterion, as was the case for some pangenomes containing many incomplete MAGs, the 50 most prevalent gene clusters had been used as a substitute. Consultant sequences for every gene cluster had been picked as ORFs originating from specI consultant genomes (that’s, high-quality reference genomes), or in any other case because the longest ORF within the cluster.

A phylogenetic tree of species-level cluster representatives was inferred primarily based on the ‘mOTU’ set of ten near-universal marker genes63. Marker genes had been aligned in amino acid sequence area throughout all species utilizing Muscle v.3.8.31 (ref. 92), concatenated after which used to assemble a species tree with FastTree2 (v.2.1.11)93 with default parameters.

Inference of microbial pressure populations

Metagenomic reads for every pattern had been mapped in opposition to gene cluster consultant sequences for all species pangenomes utilizing bwa-mem v.0.7.17 (ref. 69). Mapped reads had been filtered for matches of ≥45 bp and ≥97% sequence identification, sorted and filtered in opposition to a number of mappings utilizing samtools v.1.7 (ref. 94). Horizontal (‘breadth’) and vertical (‘depth’) protection of every gene cluster in every pattern had been calculated utilizing bedtools v.2.27.1 (ref. 95).

A species was thought-about current in a pattern if not less than three mOTU taxonomic marker genes had been confidently detected both by way of the mOTU v.2 profiler (for specI clusters and meta-mOTUs) or primarily based on pangenome-wide learn mappings (for non-mOTU species-level clusters). Gene clusters inside every pangenome had been thought-about current in a pattern if (1) the species was detectable (see above), (2) horizontal protection exceeded 100 bp and 20% of the consultant gene’s size and (3) common vertical protection exceeded 0.5. Gene clusters had been thought-about confidently absent if they didn’t entice any mappings in samples the place the species’ set of prolonged core genes (see above) was coated at >1 median vertical protection (that’s, current with excessive confidence). Utilizing these standards, pressure population-specific gene content material profiles had been computed for every species in every pattern.

Uncooked microbial SNVs had been referred to as from uniquely mapping reads utilizing metaSNV v.1.0.3 (ref. 96) with permissive parameters (-c 10 -t 2 -p 0.001 -d 1000000). Candidate SNVs had been retained in the event that they had been supported by two or extra reads every in two or extra samples through which the focal gene cluster was confidently detected (see above), earlier than differential downstream filtering. At multiallelic positions the frequency of every noticed allele (A, C, G, T) was normalized by the overall learn depth for all alleles.

Primarily based on these information, pressure populations had been represented primarily based on each their particular gene content material profile and SNV profile in every pattern.

Every species’ native pressure inhabitants variety (SPD) and allele distances (AD) between pressure populations throughout samples had been estimated as follows. SPD was calculated primarily based on the inverse Simpson index of allele frequencies p(ACGT) at every variant place i within the prolonged core genome (nvar), normalized by complete horizontal protection (variety of coated positions) covhor:

$${mathrm{SPD}} = frac{{mathop {sum}nolimits_{i = 1}^{n_{{mathrm{var}}}} {left( {p_{mathrm{A}}^2 + p_{mathrm{C}}^2 + p_{mathrm{G}}^2 + p_{mathrm{T}}^2} proper)^{ – 1} – 1} }}{{{mathrm{cov}}_{{mathrm{hor}}}}}$$

Thus outlined, SPD might be interpreted as the common efficient variety of nondominant alleles in a pressure inhabitants. SPD ranges between 0 (just one dominant pressure detected—that’s, no multiallelic positions) and three (all 4 doable alleles current at equal proportions at every variant place). Normalization by complete horizontal protection, covhor of the prolonged core genome ensures that values are comparable between samples even when a species’ protection in a pattern is incomplete.

Intraspecific ADs between pressure populations throughout samples had been calculated as the common Euclidean distance between noticed allele frequencies at variant positions within the species’ prolonged core genome, requiring not less than 20 variant positions with shared protection between samples. If a species was not noticed in a pattern, ADs to that pattern had been set to 1.

Quantification of strain-level outcomes

Colonization by donor strains, persistence of recipient strains and inflow of novel strains (environmental or beforehand beneath detection restrict) within the recipient microbiome following FMT had been quantified for each species primarily based on determinant microbial SNVs and gene content material profiles utilizing an method extending earlier work25,97. In complete, 261 FMT time collection (228 allogenic and 33 autologous transfers) for which a donor baseline (in allogenic FMTs; ‘D’), a recipient pre-FMT baseline (‘R’) and not less than one recipient post-FMT (‘P’) pattern had been obtainable had been taken into consideration, and every FMT was represented as a D-R-P pattern triad. If obtainable, a number of time factors put up FMT had been scored independently. By definition, as a result of no donor samples had been obtainable for autologous FMTs, recipient pre-FMT samples had been used as a substitute. An outline of potential strain-level FMT outcomes is supplied in Fig. 1c,d.

For every D-R-P pattern triad, conspecific pressure dynamics had been calculated if a species was noticed in all three samples (see above) with not less than 100 informative (determinant) variant positions both coated with two or extra reads or confidently absent (see beneath). Donor determinant alleles had been outlined as variants distinctive to the donor (D) relative to the recipient pre-FMT (R) pattern, and vice versa. Publish-FMT determinant alleles had been outlined as variants distinctive in P relative to each D and R. On condition that intraspecific fecal pressure populations are sometimes heterogeneous—that’s, encompass multiple pressure per species—a number of noticed alleles on the identical variant place had been taken into consideration. As well as, if a gene containing a putative variant place was absent from a pattern though the species’ prolonged core genome was detected, the variant was thought-about ‘confidently absent’ and handled as informative (and probably determinant) as properly, thereby taking into consideration differential gene content material between strains.

The fractions of donor and recipient strains put up FMT had been quantified primarily based on the detection of donor- and recipient-determinant variants throughout all informative positions within the P pattern. The fraction of novel strains (environmental or beforehand beneath detection restrict in donor and recipient) was quantified because the fraction of post-FMT determinant variants. Primarily based on these three readouts (fraction of donor, recipient and novel strains) and cutoffs beforehand established by Li et al.25, FMT outcomes had been scored categorically as ‘donor colonization’, ‘recipient persistence’, ‘donor–recipient coexistence’ or ‘inflow of novel (beforehand undetected) strains’ for each species (Supplementary Desk 5).

Along with conspecific pressure dynamics (that’s, the place a species was current in D, R and P), we additionally quantified FMT outcomes that concerned the acquisition or lack of whole pressure populations. For instance, if a species was current within the recipient at baseline however not put up FMT, this was thought-about a ‘species loss’ occasion. See Fig. 1c and Supplementary Desk 5 for a full overview of how completely different FMT final result eventualities had been scored.

To say the accuracy of our method, we simulated FMT time collection by shuffling (1) the donor pattern, (2) the recipient pre-FMT pattern or (3) each. Randomizations had been stratified by topic (accounting for the truth that some donors had been utilized in a number of FMTs and that some recipients acquired repeated remedies) and geography. For every noticed D-R-P pattern triad, we simulated ten triads per every of the above setups.

Outcomes had been additional summarized throughout species by calculating a collection of pressure population-level metrics for every FMT, outlined as follows.

Persistence index: common fraction of persistent recipient strains amongst all species noticed put up FMT (that’s, fraction of post-FMT pressure populations attributable to recipient baseline strains).

Colonization index: common fraction of donor strains amongst all species put up FMT.

Modeling and prediction of FMT outcomes

We explored a big set of covariates as putative predictor variables for FMT outcomes, grouped into the next classes: (1) host medical and procedural variables (for instance, FMT indication, pre-FMT bowel preparation, FMT route and so forth); (2) community-level taxonomic variety (species richness, neighborhood composition and so forth); (3) community-level metabolic profiles (abundance of particular pathways); (4) abundance profiles of particular person species; (5) strain-level outcomes for different species within the system; and (6) focal species traits, together with strain-level variety; see Supplementary Desk 6 for a full record of covariates and their definitions. We additional categorized covariates as both predictive ex ante variables (that’s, knowable earlier than the FMT is carried out) or put up hoc variables (that’s, pertaining to the post-FMT state, or the relation between pre- and post-FMT states).

We constructed two sorts of mannequin to foretell FMT strain-level outcomes primarily based on these covariates: (1) FMT-wide fashions, utilizing abstract final result metrics throughout all species in a time collection (persistence index, colonization index; see above) as response variables; and (2) per-species fashions for 307 species noticed in ≥50 FMTs, utilizing every species’ strain-level final result in each scored time collection as response variable. Until in any other case indicated, the final obtainable time level for every FMT time collection was used. Fashions had been constructed for every covariate class individually, in addition to for mixtures of all ex ante and all put up hoc variables, respectively.

On condition that the variety of covariates vastly exceeded the variety of obtainable FMT time collection, and that a number of covariates had been correlated with one another (Supplementary Fig. 3), FMT outcomes had been modeled utilizing ten instances fivefold cross-validated LASSO-regularized regression, as applied within the R package deal glmnet (v.4.1.3)98. Regression coefficients had been chosen at one customary error from the cross-validated minimal lambda worth and averaged throughout validation folds.

Linear LASSO regression was used to mannequin outcomes with steady response variables, each for FMT-wide outcomes (persistence index and shortly) and for the fraction of colonizing, persisting and coexisting strains per species throughout FMTs. For linear fashions, R2 of predictions on check units was averaged throughout validation folds. Furthermore, logistic LASSO regression was used to moreover mannequin binarized FMT outcomes per species, outlined as recipient pressure resilience, recipient pressure turnover and donor pressure takeover, primarily based on additional summarizing final result classes in Supplementary Desk 5. For logistic fashions, accuracy was assessed as space below the receiver working attribute curve (AUROC) averaged throughout validation folds.

Statistical analyses

Affiliation of medical outcomes (excluding a subset of cohorts for which medical success was not reported; Supplementary Desk 3) with FMT strain-level outcomes was examined utilizing Wilcoxon checks (responders versus nonresponders), and in addition by sequential ANOVA on linear regression fashions (accounting for added variables), in every case adopted by Benjamini–Hochberg correction for a number of speculation checks. Variations in strain-level outcomes between species throughout taxonomic clades and inferred species phenotypes had been examined utilizing ANOVA on linear regression fashions.

Reporting abstract

Additional info on analysis design is on the market within the Nature Analysis Reporting Abstract linked to this text.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments