CRISPR/Cas9 system, derived from an adaptive defense system in prokaryotes against alien nucleic acids, is a promising RNA-guided genome editing technology that has been successfully applied to bacteria, animals and plants. The system is composed of a Cas9 nuclease and a single-guide RNA (sgRNA) , which complementarily binds to a ~20nt targeted sequence. So far, a large number of sgRNA design software have been developed. However, they were usually developed for protein-coding genes without considering long noncoding RNA (lncRNA) genes, although lncRNA genes have differences in terms of nucleic acid sequence and genome editing mechanism preferences.
In this project, we evaluated the performance differences of a series of known sgRNA design tools in CRISPR/Cas9 system under coding and noncoding datasets. Meanwhile, we analyzed the basis of these differences, that is, the specificity of lncRNA-specific sgRNA in nucleic acid sequence, RNA structure thermokinetics, genome location and editing mechanism preference. Furthermore, we proposed a new machine learning method, CRISPRlnc, for designing lncRNA-specific sgRNA in CRISPR/Cas9 system. CRISPRlnc was trained based on noncoding datasets of two different mechanisms, namely CRISPR knock-out (CRISPRko) and CRISPR inhibition (CRISPRi) to capture the different characterization preferences of on-target validity under these two mechanisms. Performance comparison on multiple datasets showed that CRISPRlnc was far superior to existing methods in lncRNA-specific sgRNA design in both CRISPRko and CRISPRi mechanisms.
For the convenience of everyone's use, we provide an online web service version of CRISPRlnc, and we also provide a CRISPRlnc program download on GitHub (https://github.com/Mera676/CRISPRlnc). In the design of sgRNA based on CRISPRko, we added the function of paired sgRNA design, as the CRISPRko mechanism for lncRNA prefers to knock out large fragments of gene body region through paired sgRNAs working together, which will result in the loss of function of the whole lncRNA gene. In the design of sgRNA based on CRISPRi, we will automatically obtain the promoter sequence of the lncRNA gene based on the user input gene ID and design sgRNA, as the CRISPRi mechanism tends to target the promoter region of the lncRNA gene and thus inhibit its transcription. To help users further evaluate the performance of sgRNA, we have incorporated off-target risk analysis for each target into the implementation of the CRISPRlnc tool. In addition, by integrating the three scores for on-target validity, off-target risk, and genomic location, we provide a composite weighted score for each sgRNA, with higher scores being better.