All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

SRE实战 互联网时代守护先锋,助力企业售后服务体系运筹帷幄!一键直达领取阿里云限量特价优惠。

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

题目大意:

找到字符串中连续出现1次以上的子串,子串的长度为10。

解法:

使用一个map保存出现字符串以及对应的次数,如果出现2次及以上,加入到结果中。

java:

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        HashMap<String,Integer>t=new HashMap();
        List<String>res=new ArrayList<String>();
        int start=0,i=10;
        for(;i<=s.length();start++,i++){
            if(t.containsKey(s.substring(start,i))&&t.get(s.substring(start,i))==1){
                int count=t.get(s.substring(start,i));
                t.put(s.substring(start,i),++count);
                res.add(s.substring(start,i));
            }
            else if(t.containsKey(s.substring(start,i))){
                int count=t.get(s.substring(start,i));
                t.put(s.substring(start,i),++count);
            }
            else t.put(s.substring(start,i),1);
        }

        return res;
    }
}

 想复杂了,可以使用两个set进行存储,一个存字符串的所有不重复子串,一个存出现两次及以上的子串。

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        Set<String>seen=new HashSet<>(),repeat=new HashSet<>();
        for(int i=0;i+9<s.length();i++){
            if(!seen.add(s.substring(i,i+10)))
                repeat.add(s.substring(i,i+10));
        }
        return new ArrayList<>(repeat);
    }
}

 最佳的方法:

  使用位图法(Java中自带BitSet,BitMap类)。主要是提高空间效率

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        int len = s.length(), mod = 1 << 20, mask = 0xFFFFF, n = 0;
        if (len < 11) return new ArrayList<>();
        
        // 'A' -> 0  'C' -> 1 'G' -> 2 'T' -> 3
        int[] ints = new int['T' + 1];
        ints['C'] = 1; ints['G'] = 2; ints['T'] = 3;
 
        List<String> list = new ArrayList<>();
        BitSet seen = new BitSet(mod), added = new BitSet(mod);
        for (int i = 0; i < len; i++) {
            n = (n << 2) & mask | ints[s.charAt(i)]; // 获得连续10个字符串的哈希码 (基于上述的字符串->整数映射)
            if (i >= 9)
                if (seen.get(n)) { // 如果seen中在n位置上已有数字 则返回true 否则返回false
                    if (!added.get(n)) {
                        added.set(n);
                        list.add(s.substring(i - 9, i + 1));
                    }
                } else seen.set(n);
        }
        return list;
    }
}

  

扫码关注我们
微信号:SRE实战
拒绝背锅 运筹帷幄