Methods, computer systems, and computer program products for biopolymer engineering. A variant set for a biopolymer of interest is constructed by identifying, using a plurality of rules, a plurality of positions in the biopolymer of interest and, for each respective position in the plurality of positions, substitutions for the respective position. The plurality of positions and the substitutions for each respective position in the plurality of positions collectively define a biopolymer sequence space. A variant set comprising a plurality of variants of the biopolymer of interest is selected. A property of all or a portion of the variants in the variant set is measured. A sequence-activity relationship is modeled between (i) one or more substitutions at one or more positions of the biopolymer of interest represented by the variant set and (ii) the property measured for all or the portion of the variants in the variant set. The variant set is redefined to comprise variants that include substitutions in the plurality of positions that are selected based on a function of the sequence-activity relationship.
用于
生物聚合物工程的方法、计算机系统和计算机程序产品。通过使用多条规则确定
生物聚合物中的多个位置,并确定多个位置中每个位置的替代物,从而构建相关
生物聚合物的变体集。多个位置和多个位置中每个位置的替代物共同定义了一个
生物聚合物序列空间。选择一个变体集,该变体集由相关
生物聚合物的多个变体组成。测量变体集中所有或部分变体的特性。在以下两者之间建立序列-活性关系模型:(i) 变体集所代表的相关
生物聚合物的一个或多个位置上的一个或多个置换;(ii) 针对变体集中所有或部分变体所测量的特性。变体集被重新定义为包括变体,这些变体包括根据序列-活性关系函数选择的多个位置上的取代。