1. Objectives 学习目标

  • To understand the string/list data type and their representation 理解字符串/列表数据类型及其表示
  • To become familiar with operations on strings/lists through built-in functions and methods 熟悉通过内置函数和方法对字符串/列表的操作
  • To be able to apply string formatting to produce attractive output 能够应用字符串格式化产生美观的输出

2. Sequence 序列

What is a Sequence? 什么是序列?

  • A positionally ordered collection of items 位置有序的项目集合
  • Can refer to any item using its index number 可以使用索引号引用任何项目

Necessity of Sequence 序列的必要性

  • Words in a document 文档中的单词
  • Students in a course 课程中的学生
  • Experimental data 实验数据
  • Business customers 商业客户

Types of Sequences 序列类型

  • Strings字符串: 'hello world!'
  • Lists列表: ['h','e','l','l','o',' ','w','o','r','l','d','!']
  • Files文件

Sequence Characteristics 序列特性

  • Homogeneous同质序列: All elements same type (e.g., strings) 所有元素类型相同
  • Heterogeneous异质序列: Mixed element types (e.g., lists) 混合元素类型
  • Homogeneous sequences are more efficient 同质序列更高效

Iterables 可迭代对象

  • Any sequence is iterable 任何序列都是可迭代的
  • Iterables are more general than sequences 可迭代对象比序列更通用
  • Can get each element one by one 可以逐个获取每个元素

3. Data Type: String 数据类型:字符串

String Basics 字符串基础

  • Sequence of characters enclosed in quotation marks 引号内的字符序列
  • Use " " or ' ' for single-line strings 单行字符串使用双引号或单引号
  • Use """ """ or ''' ''' for multi-line strings 多行字符串使用三引号
>>> firstName = input("Please enter your name: ")
Please enter your name: John
>>> print("Hello", firstName)
Hello John

String Indexing 字符串索引

Forward index正向索引: 0, 1, 2, ... Reverse index反向索引: -1, -2, -3, ...

text = "Hello World"
print(text[0])    # 'H'
print(text[-1])   # 'd'

Notes注意:

  • Index out of range causes error 索引越界会导致错误
  • Index must be integer 索引必须是整数

String Slicing 字符串切片

<string>[<start>:<end>:<step>]
text = "Hello World"
print(text[0:5])     # 'Hello'
print(text[6:])      # 'World' 与text[6]区分,text[6:]表示从第6个字符开始到结束
print(text[::2])     # 'HloWrd' 0,2,4...
print(text[::-1])    # 'dlroW olleH' (reverse) -1,-2,-3...
print(text[::-2])    # 'drWolH' -1,-3,-5...

4. Simple String Processing 简单字符串处理

Username Generation 用户名生成

  • First letter of first name + up to 7 letters from last name 名字首字母 + 最多7个姓氏字母
  • Example: Zaphod Beeblebrox → zbeebleb
# Username generation program
firstName = input("Please enter your first name (all lowercase): ")
lastName = input("Please enter your last name (all lowercase): ")
uname = firstName[0] + lastName[:7]
print("uname =", uname)

Month Abbreviation 月份缩写

# Program to get month abbreviation
months = "JanFebMarAprMayJunJulAugSepOctNovDec"
n = int(input("Enter a month number (1-12): "))
pos = (n-1) * 3
monthAbbrev = months[pos:pos+3]
print("The month abbreviation is", monthAbbrev)

Weakness弱点: Only works for outputs of same length 仅适用于相同长度的输出

5. String Operators 字符串运算符

Rational Operators 关系运算符

Concatenation连接: + Repetition重复: *

>>> "spam" + "eggs"
'spameggs'
>>> 3 * "spam"
'spamspamspam'
>>> (3 * "spam") + ("eggs" * 5)
'spamspamspameggseggseggseggseggs'

len() function:

>>> len("spam")
4
>>> for ch in "Spam!":
        print(ch, end=" ")
S p a m !

Comparison Operators 比较运算符

  • Based on ASCII values 基于ASCII值
  • Compare character by character 逐个字符比较
>>> "apple" < "banana"
True
>>> "Apple" < "apple"  # 'A' (65) < 'a' (97)
True

Member Operators 成员运算符

  • in and not in
  • Return boolean values 返回布尔值
>>> 'a' in 'apple'
True
>>> 'seed' not in 'apple'
True

Formatting Operators 格式化运算符

'%2d/%2d string' % (var1, var2)

Format specifiers格式说明符:

  • %d: Decimal integer 十进制整数
  • %2d: Width 2, right-justified, space-filled 宽度2,右对齐,空格填充
  • %02d: Width 2, right-justified, zero-filled 宽度2,右对齐,零填充

6. String Representation 字符串表示

Character Encoding 字符编码

  • ASCII: 127 bit codes, US-centric ASCII码,127位,以美国为中心
  • Extended ASCII: Additional characters 扩展ASCII,附加字符
  • Unicode: Universal standard, 100,000+ characters Unicode,通用标准,10万+字符
  • UTF-8: Unicode Transformation Format UTF-8,Unicode转换格式

Character Functions 字符函数

>>> ord("A")    # Get numeric code 获取数字代码
65
>>> ord("a")
97
>>> chr(97)     # Get character from code 从代码获取字符
'a'
>>> chr(65)
'A'

7. String Formatting 字符串格式化

Format Method 格式化方法

<template-string>.format(<values>)
total = 1.5
print("The total value of your change is ${0:0.2f}".format(total))
# Output: The total value of your change is $1.50

Format specifier格式说明符: <width>.<precision><type>

  • 0.2f: Width 0, precision 2, fixed-point number 宽度0,精度2,定点数

Examples 示例

>>> "Hello {0} {1}, you may have won ${2}".format("Mr.", "Smith", 10000)
'Hello Mr. Smith, you may have won $10000'
 
>>> 'This int, {0:5}, was placed in a field of width 5'.format(7)
'This int,     7, was placed in a field of width 5'
 
>>> 'This float, {0:10.5f}, is fixed at 5 decimal places'.format(3.1415926)
'This float,   3.14159, has width 10 and precision 5.'

Justification 对齐

>>> "left: {0:<5}".format("Hi!")
'left: Hi!  '
>>> "right: {0:>5}".format("Hi!")
'right:   Hi!'
>>> "centered: {0:^5}".format("Hi!")
'centered:  Hi! '

8. Programming an Encoder 编程实现编码器

Encoding Algorithm 编码算法

# Encode message to Unicode numbers
message = input("Please enter the message to encode: ")
print("Here are the Unicode codes:")
 
for ch in message:
    print(ord(ch), end=" ")
print()  # Blank line

Decoding Algorithm 解码算法

# Decode Unicode numbers to message
inString = input("Please enter the Unicode-encoded message: ")
 
message = ""
for numStr in inString.split():
    codeNum = int(numStr)
    message = message + chr(codeNum)
 
print("\nThe decoded message is:", message)

Improved Decoder with Lists 使用列表改进解码器

# More efficient decoder using lists
inString = input("Please enter the Unicode-encoded message: ")
 
chars = []  # Create empty list 创建空列表
for numStr in inString.split():
    codeNum = int(numStr)
    chars.append(chr(codeNum))  # Append to list 追加到列表
 
message = "".join(chars)  # Join list into string 将列表连接为字符串
print("\nThe decoded message is:", message)

9. String Methods 字符串方法

Common String Methods 常用字符串方法

s.capitalize()      # First character capitalized 首字母大写
s.title()           # Each word capitalized 每个单词首字母大写
s.center(width)     # Center in field 在字段中居中
s.count(sub)        # Count occurrences 计数出现次数
s.find(sub)         # Find first position 查找第一个位置
s.join(list)        # Join list with separator 用分隔符连接列表
s.lower()           # Convert to lowercase 转换为小写
s.upper()           # Convert to uppercase 转换为大写

Stripping Methods 去除方法

s.strip()           # Remove leading/trailing whitespace 去除首尾空白
s.lstrip()          # Remove leading whitespace 去除开头空白
s.rstrip()          # Remove trailing whitespace 去除结尾空白

Replacement and Splitting 替换和分割

s.replace(old, new)     # Replace substrings 替换子字符串
s.split()               # Split into list 分割为列表

Validation Methods 验证方法

s.islower()         # All characters lowercase? 所有字符小写?
s.isupper()         # All characters uppercase? 所有字符大写?
s.isalpha()         # All characters alphabetic? 所有字符字母?
s.isdigit()         # All characters digits? 所有字符数字?
s.isalnum()         # All characters alphanumeric? 所有字符字母数字?

Advanced Methods 高级方法

# Find with parameters
s.find(str, beg=0, end=len(string))
 
# Join with separator
s.join(sequence)
 
# Count occurrences
s.count(substring)

10. From Encoding to Encryption 从编码到加密

Encryption 加密

  • Process of encoding information for secrecy 为保密而编码信息的过程
  • Cryptography密码学: Study of encryption methods 加密方法的研究

Real-world Application 实际应用

  • Used in credit card transmissions 用于信用卡传输
  • Protects personal information 保护个人信息

11. Data Type: List 数据类型:列表

List Basics 列表基础

  • Sequences of arbitrary values 任意值的序列
  • Can contain mixed data types 可以包含混合数据类型
myList = [1, "Spam", 4, "U"] # 混合列表
myList = []  # Empty list 空列表
myList = [Point(1,1), Point(1,2)]  # Objects 对象
 
student = [2020310912, 'Xiaoming', 'Male', 'China', 
          ['Information School', 'Computer Science', 20020501]] # 列表嵌套

List Indexing and Slicing 列表索引和切片

student = [2020310912, 'Xiaoming', 'Male', 'China','Information School', 'Computer Science', 20020501]
 
print(student[1])           # 'Xiaoming'
print(student[4])           # 'Information School'
print(student[:-3])         # '[2020310912, 'Xiaoming', 'Male', 'China']' 从第0个元素开始到倒数第3个元素结束,不包括倒数第3个元素
print(student[1:5:2])       # Slice with step 带步长的切片
print(student[:-3:-1])      # '[20020501, 'Computer Science']' 从-1开始,每次减1,直到倒数第3个元素结束,不包括倒数第3个元素
 
# Nested indexing 嵌套索引
print(student[4][0])        # 'I' from 'Information School'

12. Immutable vs. Mutable 不可变 vs 可变

Immutable Objects 不可变对象

  • Numbers, strings, tuples 数字、字符串、元组
  • Cannot be changed 不能更改
  • New object created when modified 修改时创建新对象

Mutable Objects 可变对象

  • Lists, dictionaries, sets 列表、字典、集合
  • Can be changed in place 可以原地更改
  • Physical address remains same 物理地址保持不变

13. Lists as Sequences 列表作为序列

List Operations 列表操作

>>> [1,2] + [3,4]          # Concatenation 连接
[1, 2, 3, 4]
>>> [1,2] * 3              # Repetition 重复
[1, 2, 1, 2, 1, 2]
>>> grades = ['A', 'B', 'C', 'D', 'F']
>>> grades[0]               # Indexing 索引
'A'
>>> grades[2:4]             # Slicing 切片
['C', 'D']
>>> len(grades)             # Length 长度
5

Month Program with Lists 使用列表的月份程序

# Improved month program using lists
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", 
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
 
n = int(input("Enter a month number (1-12): "))
monthAbbrev = months[n-1]
print("The month abbreviation is", monthAbbrev)
 
# Can easily extend to full names 可以轻松扩展为全名
full_months = ["January", "February", "March", "April", 
               "May", "June", "July", "August", 
               "September", "October", "November", "December"]

14. List Methods 列表方法

append() Method 追加方法

# Create list of squares 创建平方列表
squares = []
for i in range(1, 101):
    squares.append(i * i)
print(squares)  # [1, 4, 9, ..., 10000]

join() Method 连接方法

# Join list into string 将列表连接为字符串
words = ['Hello', 'world', '!']
message = " ".join(words)  # 'Hello world !'

14.1 Exercises 练习

Morse Code 摩斯电码

Task任务: Convert English sentence to Morse code 将英文句子转换为摩斯电码

morse_code = [".-","-...","-.-.","-..",".","..-.","--.","....",
              "..",".---","-.-",".-..","--","-.","---",".--.",
              "--.-",".-.","...","-","..-","...-",".--","-..-",
              "-.--","--.."]
 
# Convert letters to Morse code 将字母转换为摩斯电码
# Use list indexing based on character position 基于字符位置使用列表索引

Caesar Cipher 凯撒密码

Task任务: Implement Caesar encryption with variable shift 实现可变位移的凯撒加密

# Caesar cipher implementation
plaintext = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
shift = 19
ciphertext = ""
 
for char in plaintext:
    if char.isalpha():
        # Apply shift and handle wrap-around 应用位移并处理回绕
        shifted = (ord(char) - ord('A') + shift) % 26
        ciphertext += chr(shifted + ord('A'))
    else:
        ciphertext += char
 
print("Ciphertext:", ciphertext)

15. Files: Multi-line Strings 文件:多行字符串

File Basics 文件基础

  • Sequence of data stored in secondary memory 存储在辅助内存中的数据序列
  • Contains multiple lines of text 包含多行文本
  • Uses newline character \n for line breaks 使用换行符\n表示换行

Multi-line Strings 多行字符串

multi_line = """This is a
multi-line
string"""
print(multi_line)

16. File Processing 文件处理

File Operations 文件操作

  1. Opening打开: Associate disk file with memory object 将磁盘文件与内存对象关联
  2. Processing处理: Read from or write to file 从文件读取或写入文件
  3. Closing关闭: Complete operations and bookkeeping 完成操作和簿记

Opening Files 打开文件

<filevar> = open(<name>, <mode>)

Modes模式:

  • 'r': Read 读取
  • 'w': Write (overwrites) 写入(覆盖)
  • 'a': Append 追加
  • 'r+': Read and write 读取和写入
infile = open("numbers.dat", "r")
outfile = open("output.txt", "w")

File Paths 文件路径

Relative path相对路径:

f = open("../data.txt", "r")  # Parent folder 父文件夹

Absolute path绝对路径:

f = open("C:/users/data.txt", "r")  # Full path 完整路径

File Dialogs 文件对话框

from tkinter.filedialog import askopenfilename
 
infileName = askopenfilename()
if infileName:
    infile = open(infileName, "r")
    # Process file 处理文件

17. File Methods 文件方法

Reading Methods 读取方法

# printfile.py
def main():
    fname = input("Enter filename: ")
    infile = open(fname, 'r')
    data = infile.read()      # Read entire file 读取整个文件
    print(data)
    infile.close()

Reading options读取选项:

  • read(): Entire file as string 整个文件作为字符串
  • readline(): Next line including newline 下一行包括换行符
  • readlines(): List of all lines 所有行的列表

Reading Examples 读取示例

# Read first 5 lines 读取前5行
infile = open(someFile, "r")
for i in range(5):
    line = infile.readline()
    print(line[:-1])  # Strip newline 去除换行符
infile.close()
 
# Read all lines using readlines 使用readlines读取所有行
infile = open(someFile, "r")
for line in infile.readlines():
    print(line.rstrip())  # Remove trailing whitespace 去除尾部空白
infile.close()
 
# Treat file as sequence 将文件视为序列
infile = open(someFile, "r")
for line in infile:
    print(line, end="")  # Line already has newline 行已有换行符
infile.close()

Writing Methods 写入方法

outfile = open("mydata.out", "w")
print("Hello world!", file=outfile)
outfile.close()
 
# Using write method 使用write方法
outfile = open("data.txt", "w")
outfile.write("Hello world!\n")
outfile.close()
 
# Using writelines 使用writelines
lines = ["Line 1\n", "Line 2\n", "Line 3\n"]
outfile = open("data.txt", "w")
outfile.writelines(lines)
outfile.close()

File Position 文件位置

# tell() - get current position 获取当前位置
position = file.tell()
 
# seek() - move cursor 移动光标
file.seek(offset, whence=0)
# whence: 0=beginning, 1=current, 2=end

Using with Statement 使用with语句

# Automatically closes file 自动关闭文件
with open('data.txt', 'r') as f:
    data = f.read()
    # File automatically closed here 文件在此自动关闭

18. CSV Files CSV文件

CSV Processing CSV处理

  • Tabular data with records and fields 具有记录和字段的表格数据
  • Fields separated by commas or tabs 字段由逗号或制表符分隔

Using csv Module 使用csv模块

import csv
 
# Writing CSV 写入CSV
with open('data.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Name', 'Age', 'City'])
    writer.writerow(['John', '25', 'New York'])
 
# Reading CSV 读取CSV
with open('data.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

Using pandas (Third-party) 使用pandas(第三方)

import pandas as pd
 
# Read CSV
df = pd.read_csv('data.csv')
 
# Write CSV
df.to_csv('output.csv', index=False)

18.1 Example Program: Batch Usernames 示例程序:批量用户名

Task任务

  • Process input file with first and last names 处理包含名字和姓氏的输入文件
  • Generate system usernames 生成系统用户名
  • Format: first character of first name + up to 7 of last name 格式:名字首字母 + 最多7个姓氏字母

Implementation实现

def main():
    # Get file names
    infileName = input("What file are the names in? ")
    outfileName = input("What file should the usernames go in? ")
    
    # Open files
    infile = open(infileName, 'r')
    outfile = open(outfileName, 'w')
    
    # Process each line
    for line in infile:
        # Get first and last names from line
        first, last = line.split()
        # Create username
        uname = (first[0] + last[:7]).lower()
        # Write to output file
        print(uname, file=outfile)
    
    # Close files
    infile.close()
    outfile.close()
    print("Usernames have been written to", outfileName)

18.2 Exercise: Stock Data Processing 练习:股票数据处理

Tasks任务

  1. Read and parse file 读取和解析文件

    • Remove extra spaces 去除多余空格
    • Standardize date format to YYYY-MM-DD 标准化日期格式为YYYY-MM-DD
  2. String processing 字符串处理

    • Convert company code to uppercase 公司代码转换为大写
    • Replace / with - in dates 日期中/替换为-
  3. Basic analysis 基本分析

    • Find highest/lowest closing prices 查找最高/最低收盘价
    • Calculate average closing price 计算平均收盘价
  4. User input 用户输入

    • Prompt for company code 提示输入公司代码
    • Analyze data for specified company 分析指定公司的数据

下一章