数学建模社区-数学中国

标题: python 比较和处理文本之间的差异 [打印本页]

作者: 2744557306 时间: 2024-3-31 11:12
标题: python 比较和处理文本之间的差异

difflib是Python标准库中的一个模块，用于比较和处理文本之间的差异。它提供了一些函数和类，可以用于生成差异报告、计算相似度、查找最长公共子序列等操作。
安装内置库无需安装常见用法1：比较差异import difflib

text1 = "hello world"

text2 = "hello there"

diff = difflib.ndiff(text1, text2)

print('\n'.join(diff))
复制代码
常见用法2：比较文件的差异import difflib

with open('file1.txt') as file1, open('file2.txt') as file2:

diff = difflib.ndiff(file1.readlines(), file2.readlines())

print('\n'.join(diff))
复制代码
常见用法3：比较列表的差异import difflib

list1 = ['apple', 'banana', 'cherry']

list2 = ['apple', 'banana', 'kiwi']

diff = difflib.ndiff(list1, list2)

print('\n'.join(diff))
复制代码

常见用法4：比较字符串相似度import difflib

text1 = "hello world"

text2 = "hello there"

similarity = difflib.SequenceMatcher(None, text1, text2).ratio()

print(similarity)
复制代码

输出，相似度百分之63.6%0.6363636363636364常见用法5：获取两个字符串的相似块：import difflib

text1 = "hello world"

text2 = "hello there"

blocks = difflib.SequenceMatcher(None, text1, text2).get_matching_blocks()

print(blocks)
复制代码

输出[Match(a=0, b=0, size=6), Match(a=8, b=9, size=1), Match(a=11, b=11, size=0)]
常见用法6：获取两个字符串的最长公共子序列import difflib

text1 = "hello world"

text2 = "hello there"

lcs = difflib.SequenceMatcher(None, text1, text2).find_longest_match(0, len(text1), 0, len(text2))

print(lcs)
复制代码
输出Match(a=0, b=0, size=6)import difflib

text1 = "hello world"

text2 = "hello there"

lcs = difflib.SequenceMatcher(None, text1, text2).find_longest_match(0, len(text1), 0, len(text2))

print(text1[lcs.a: lcs.a + lcs.size])
复制代码
输出hello 常见用法7：比较两个字符串，并返回上下文差异import difflib

text1 = "hello world"

text2 = "hello there"

diff = difflib.context_diff(text1, text2)

print('\n'.join(diff)

复制代码

欢迎光临数学建模社区-数学中国 (http://www.madio.net/)