Chapter5

1. Objectives 学习目标

To understand the string/list data type and their representation 理解字符串/列表数据类型及其表示
To become familiar with operations on strings/lists through built-in functions and methods 熟悉通过内置函数和方法对字符串/列表的操作
To be able to apply string formatting to produce attractive output 能够应用字符串格式化产生美观的输出

2. Sequence 序列

What is a Sequence? 什么是序列？

A positionally ordered collection of items 位置有序的项目集合
Can refer to any item using its index number 可以使用索引号引用任何项目

Necessity of Sequence 序列的必要性

Words in a document 文档中的单词
Students in a course 课程中的学生
Experimental data 实验数据
Business customers 商业客户

Types of Sequences 序列类型

Strings字符串: 'hello world!'
Lists列表: ['h','e','l','l','o',' ','w','o','r','l','d','!']
Files文件

Sequence Characteristics 序列特性

Homogeneous同质序列: All elements same type (e.g., strings) 所有元素类型相同
Heterogeneous异质序列: Mixed element types (e.g., lists) 混合元素类型
Homogeneous sequences are more efficient 同质序列更高效

Iterables 可迭代对象

Any sequence is iterable 任何序列都是可迭代的
Iterables are more general than sequences 可迭代对象比序列更通用
Can get each element one by one 可以逐个获取每个元素

3. Data Type: String 数据类型：字符串

String Basics 字符串基础

Sequence of characters enclosed in quotation marks 引号内的字符序列
Use " " or ' ' for single-line strings 单行字符串使用双引号或单引号
Use """ """ or ''' ''' for multi-line strings 多行字符串使用三引号

>>> firstName = input("Please enter your name: ")
Please enter your name: John
>>> print("Hello", firstName)
Hello John

String Indexing 字符串索引

Forward index正向索引: 0, 1, 2, ... Reverse index反向索引: -1, -2, -3, ...

text = "Hello World"
print(text[0])    # 'H'
print(text[-1])   # 'd'

Notes注意:

Index out of range causes error 索引越界会导致错误
Index must be integer 索引必须是整数

String Slicing 字符串切片

<string>[<start>:<end>:<step>]

text = "Hello World"
print(text[0:5])     # 'Hello'
print(text[6:])      # 'World' 与text[6]区分，text[6:]表示从第6个字符开始到结束
print(text[::2])     # 'HloWrd' 0,2,4...
print(text[::-1])    # 'dlroW olleH' (reverse) -1,-2,-3...
print(text[::-2])    # 'drWolH' -1,-3,-5...

4. Simple String Processing 简单字符串处理

Username Generation 用户名生成

First letter of first name + up to 7 letters from last name 名字首字母 + 最多7个姓氏字母
Example: Zaphod Beeblebrox → zbeebleb

# Username generation program
firstName = input("Please enter your first name (all lowercase): ")
lastName = input("Please enter your last name (all lowercase): ")
uname = firstName[0] + lastName[:7]
print("uname =", uname)

Month Abbreviation 月份缩写

# Program to get month abbreviation
months = "JanFebMarAprMayJunJulAugSepOctNovDec"
n = int(input("Enter a month number (1-12): "))
pos = (n-1) * 3
monthAbbrev = months[pos:pos+3]
print("The month abbreviation is", monthAbbrev)

Weakness弱点: Only works for outputs of same length 仅适用于相同长度的输出

5. String Operators 字符串运算符

Rational Operators 关系运算符

Concatenation连接: + Repetition重复: *

>>> "spam" + "eggs"
'spameggs'
>>> 3 * "spam"
'spamspamspam'
>>> (3 * "spam") + ("eggs" * 5)
'spamspamspameggseggseggseggseggs'

len() function:

>>> len("spam")
4
>>> for ch in "Spam!":
        print(ch, end=" ")
S p a m !

Comparison Operators 比较运算符

Based on ASCII values 基于ASCII值
Compare character by character 逐个字符比较

>>> "apple" < "banana"
True
>>> "Apple" < "apple"  # 'A' (65) < 'a' (97)
True

Member Operators 成员运算符

in and not in
Return boolean values 返回布尔值

>>> 'a' in 'apple'
True
>>> 'seed' not in 'apple'
True

Formatting Operators 格式化运算符

'%2d/%2d string' % (var1, var2)

Format specifiers格式说明符:

%d: Decimal integer 十进制整数
%2d: Width 2, right-justified, space-filled 宽度2，右对齐，空格填充
%02d: Width 2, right-justified, zero-filled 宽度2，右对齐，零填充

6. String Representation 字符串表示

Character Encoding 字符编码

ASCII: 127 bit codes, US-centric ASCII码，127位，以美国为中心
Extended ASCII: Additional characters 扩展ASCII，附加字符
Unicode: Universal standard, 100,000+ characters Unicode，通用标准，10万+字符
UTF-8: Unicode Transformation Format UTF-8，Unicode转换格式

Character Functions 字符函数

>>> ord("A")    # Get numeric code 获取数字代码
65
>>> ord("a")
97
>>> chr(97)     # Get character from code 从代码获取字符
'a'
>>> chr(65)
'A'

7. String Formatting 字符串格式化

Format Method 格式化方法

<template-string>.format(<values>)

total = 1.5
print("The total value of your change is ${0:0.2f}".format(total))
# Output: The total value of your change is $1.50

Format specifier格式说明符: <width>.<precision><type>

0.2f: Width 0, precision 2, fixed-point number 宽度0，精度2，定点数

Examples 示例

>>> "Hello {0} {1}, you may have won ${2}".format("Mr.", "Smith", 10000)
'Hello Mr. Smith, you may have won $10000'
 
>>> 'This int, {0:5}, was placed in a field of width 5'.format(7)
'This int,     7, was placed in a field of width 5'
 
>>> 'This float, {0:10.5f}, is fixed at 5 decimal places'.format(3.1415926)
'This float,   3.14159, has width 10 and precision 5.'

Justification 对齐

>>> "left: {0:<5}".format("Hi!")
'left: Hi!  '
>>> "right: {0:>5}".format("Hi!")
'right:   Hi!'
>>> "centered: {0:^5}".format("Hi!")
'centered:  Hi! '

8. Programming an Encoder 编程实现编码器

Encoding Algorithm 编码算法

# Encode message to Unicode numbers
message = input("Please enter the message to encode: ")
print("Here are the Unicode codes:")
 
for ch in message:
    print(ord(ch), end=" ")
print()  # Blank line

Decoding Algorithm 解码算法

# Decode Unicode numbers to message
inString = input("Please enter the Unicode-encoded message: ")
 
message = ""
for numStr in inString.split():
    codeNum = int(numStr)
    message = message + chr(codeNum)
 
print("\nThe decoded message is:", message)

Improved Decoder with Lists 使用列表改进解码器

# More efficient decoder using lists
inString = input("Please enter the Unicode-encoded message: ")
 
chars = []  # Create empty list 创建空列表
for numStr in inString.split():
    codeNum = int(numStr)
    chars.append(chr(codeNum))  # Append to list 追加到列表
 
message = "".join(chars)  # Join list into string 将列表连接为字符串
print("\nThe decoded message is:", message)

9. String Methods 字符串方法

Common String Methods 常用字符串方法

s.capitalize()      # First character capitalized 首字母大写
s.title()           # Each word capitalized 每个单词首字母大写
s.center(width)     # Center in field 在字段中居中
s.count(sub)        # Count occurrences 计数出现次数
s.find(sub)         # Find first position 查找第一个位置
s.join(list)        # Join list with separator 用分隔符连接列表
s.lower()           # Convert to lowercase 转换为小写
s.upper()           # Convert to uppercase 转换为大写

Stripping Methods 去除方法

s.strip()           # Remove leading/trailing whitespace 去除首尾空白
s.lstrip()          # Remove leading whitespace 去除开头空白
s.rstrip()          # Remove trailing whitespace 去除结尾空白

Replacement and Splitting 替换和分割

s.replace(old, new)     # Replace substrings 替换子字符串
s.split()               # Split into list 分割为列表

Validation Methods 验证方法

s.islower()         # All characters lowercase? 所有字符小写？
s.isupper()         # All characters uppercase? 所有字符大写？
s.isalpha()         # All characters alphabetic? 所有字符字母？
s.isdigit()         # All characters digits? 所有字符数字？
s.isalnum()         # All characters alphanumeric? 所有字符字母数字？

Advanced Methods 高级方法

# Find with parameters
s.find(str, beg=0, end=len(string))
 
# Join with separator
s.join(sequence)
 
# Count occurrences
s.count(substring)

10. From Encoding to Encryption 从编码到加密

Encryption 加密

Process of encoding information for secrecy 为保密而编码信息的过程
Cryptography密码学: Study of encryption methods 加密方法的研究

Real-world Application 实际应用

Used in credit card transmissions 用于信用卡传输
Protects personal information 保护个人信息

11. Data Type: List 数据类型：列表

List Basics 列表基础

Sequences of arbitrary values 任意值的序列
Can contain mixed data types 可以包含混合数据类型

myList = [1, "Spam", 4, "U"] # 混合列表
myList = []  # Empty list 空列表
myList = [Point(1,1), Point(1,2)]  # Objects 对象
 
student = [2020310912, 'Xiaoming', 'Male', 'China', 
          ['Information School', 'Computer Science', 20020501]] # 列表嵌套

List Indexing and Slicing 列表索引和切片

student = [2020310912, 'Xiaoming', 'Male', 'China','Information School', 'Computer Science', 20020501]
 
print(student[1])           # 'Xiaoming'
print(student[4])           # 'Information School'
print(student[:-3])         # '[2020310912, 'Xiaoming', 'Male', 'China']' 从第0个元素开始到倒数第3个元素结束，不包括倒数第3个元素
print(student[1:5:2])       # Slice with step 带步长的切片
print(student[:-3:-1])      # '[20020501, 'Computer Science']' 从-1开始，每次减1，直到倒数第3个元素结束，不包括倒数第3个元素
 
# Nested indexing 嵌套索引
print(student[4][0])        # 'I' from 'Information School'

12. Immutable vs. Mutable 不可变 vs 可变

Immutable Objects 不可变对象

Numbers, strings, tuples 数字、字符串、元组
Cannot be changed 不能更改
New object created when modified 修改时创建新对象

Mutable Objects 可变对象

Lists, dictionaries, sets 列表、字典、集合
Can be changed in place 可以原地更改
Physical address remains same 物理地址保持不变

13. Lists as Sequences 列表作为序列

List Operations 列表操作

>>> [1,2] + [3,4]          # Concatenation 连接
[1, 2, 3, 4]
>>> [1,2] * 3              # Repetition 重复
[1, 2, 1, 2, 1, 2]
>>> grades = ['A', 'B', 'C', 'D', 'F']
>>> grades[0]               # Indexing 索引
'A'
>>> grades[2:4]             # Slicing 切片
['C', 'D']
>>> len(grades)             # Length 长度
5

Month Program with Lists 使用列表的月份程序

# Improved month program using lists
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", 
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
 
n = int(input("Enter a month number (1-12): "))
monthAbbrev = months[n-1]
print("The month abbreviation is", monthAbbrev)
 
# Can easily extend to full names 可以轻松扩展为全名
full_months = ["January", "February", "March", "April", 
               "May", "June", "July", "August", 
               "September", "October", "November", "December"]

14. List Methods 列表方法

append() Method 追加方法

# Create list of squares 创建平方列表
squares = []
for i in range(1, 101):
    squares.append(i * i)
print(squares)  # [1, 4, 9, ..., 10000]

join() Method 连接方法

# Join list into string 将列表连接为字符串
words = ['Hello', 'world', '!']
message = " ".join(words)  # 'Hello world !'

14.1 Exercises 练习

Morse Code 摩斯电码

Task任务: Convert English sentence to Morse code 将英文句子转换为摩斯电码

morse_code = [".-","-...","-.-.","-..",".","..-.","--.","....",
              "..",".---","-.-",".-..","--","-.","---",".--.",
              "--.-",".-.","...","-","..-","...-",".--","-..-",
              "-.--","--.."]
 
# Convert letters to Morse code 将字母转换为摩斯电码
# Use list indexing based on character position 基于字符位置使用列表索引

Caesar Cipher 凯撒密码

Task任务: Implement Caesar encryption with variable shift 实现可变位移的凯撒加密

# Caesar cipher implementation
plaintext = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
shift = 19
ciphertext = ""
 
for char in plaintext:
    if char.isalpha():
        # Apply shift and handle wrap-around 应用位移并处理回绕
        shifted = (ord(char) - ord('A') + shift) % 26
        ciphertext += chr(shifted + ord('A'))
    else:
        ciphertext += char
 
print("Ciphertext:", ciphertext)

15. Files: Multi-line Strings 文件：多行字符串

File Basics 文件基础

Sequence of data stored in secondary memory 存储在辅助内存中的数据序列
Contains multiple lines of text 包含多行文本
Uses newline character \n for line breaks 使用换行符\n表示换行

Multi-line Strings 多行字符串

multi_line = """This is a
multi-line
string"""
print(multi_line)

16. File Processing 文件处理

File Operations 文件操作

Opening打开: Associate disk file with memory object 将磁盘文件与内存对象关联
Processing处理: Read from or write to file 从文件读取或写入文件
Closing关闭: Complete operations and bookkeeping 完成操作和簿记

Opening Files 打开文件

<filevar> = open(<name>, <mode>)

Modes模式:

'r': Read 读取
'w': Write (overwrites) 写入（覆盖）
'a': Append 追加
'r+': Read and write 读取和写入

infile = open("numbers.dat", "r")
outfile = open("output.txt", "w")

File Paths 文件路径

Relative path相对路径:

f = open("../data.txt", "r")  # Parent folder 父文件夹

Absolute path绝对路径:

f = open("C:/users/data.txt", "r")  # Full path 完整路径

File Dialogs 文件对话框

from tkinter.filedialog import askopenfilename
 
infileName = askopenfilename()
if infileName:
    infile = open(infileName, "r")
    # Process file 处理文件

17. File Methods 文件方法

Reading Methods 读取方法

# printfile.py
def main():
    fname = input("Enter filename: ")
    infile = open(fname, 'r')
    data = infile.read()      # Read entire file 读取整个文件
    print(data)
    infile.close()

Reading options读取选项:

read(): Entire file as string 整个文件作为字符串
readline(): Next line including newline 下一行包括换行符
readlines(): List of all lines 所有行的列表

Reading Examples 读取示例

# Read first 5 lines 读取前5行
infile = open(someFile, "r")
for i in range(5):
    line = infile.readline()
    print(line[:-1])  # Strip newline 去除换行符
infile.close()
 
# Read all lines using readlines 使用readlines读取所有行
infile = open(someFile, "r")
for line in infile.readlines():
    print(line.rstrip())  # Remove trailing whitespace 去除尾部空白
infile.close()
 
# Treat file as sequence 将文件视为序列
infile = open(someFile, "r")
for line in infile:
    print(line, end="")  # Line already has newline 行已有换行符
infile.close()

Writing Methods 写入方法

outfile = open("mydata.out", "w")
print("Hello world!", file=outfile)
outfile.close()
 
# Using write method 使用write方法
outfile = open("data.txt", "w")
outfile.write("Hello world!\n")
outfile.close()
 
# Using writelines 使用writelines
lines = ["Line 1\n", "Line 2\n", "Line 3\n"]
outfile = open("data.txt", "w")
outfile.writelines(lines)
outfile.close()

File Position 文件位置

# tell() - get current position 获取当前位置
position = file.tell()
 
# seek() - move cursor 移动光标
file.seek(offset, whence=0)
# whence: 0=beginning, 1=current, 2=end

Using with Statement 使用with语句

# Automatically closes file 自动关闭文件
with open('data.txt', 'r') as f:
    data = f.read()
    # File automatically closed here 文件在此自动关闭

18. CSV Files CSV文件

CSV Processing CSV处理

Tabular data with records and fields 具有记录和字段的表格数据
Fields separated by commas or tabs 字段由逗号或制表符分隔

Using csv Module 使用csv模块

import csv
 
# Writing CSV 写入CSV
with open('data.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Name', 'Age', 'City'])
    writer.writerow(['John', '25', 'New York'])
 
# Reading CSV 读取CSV
with open('data.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

Using pandas (Third-party) 使用pandas（第三方）

import pandas as pd
 
# Read CSV
df = pd.read_csv('data.csv')
 
# Write CSV
df.to_csv('output.csv', index=False)

18.1 Example Program: Batch Usernames 示例程序：批量用户名

Task任务

Process input file with first and last names 处理包含名字和姓氏的输入文件
Generate system usernames 生成系统用户名
Format: first character of first name + up to 7 of last name 格式：名字首字母 + 最多7个姓氏字母

Implementation实现

def main():
    # Get file names
    infileName = input("What file are the names in? ")
    outfileName = input("What file should the usernames go in? ")
    
    # Open files
    infile = open(infileName, 'r')
    outfile = open(outfileName, 'w')
    
    # Process each line
    for line in infile:
        # Get first and last names from line
        first, last = line.split()
        # Create username
        uname = (first[0] + last[:7]).lower()
        # Write to output file
        print(uname, file=outfile)
    
    # Close files
    infile.close()
    outfile.close()
    print("Usernames have been written to", outfileName)

18.2 Exercise: Stock Data Processing 练习：股票数据处理

Tasks任务

Read and parse file 读取和解析文件
- Remove extra spaces 去除多余空格
- Standardize date format to YYYY-MM-DD 标准化日期格式为YYYY-MM-DD
String processing 字符串处理
- Convert company code to uppercase 公司代码转换为大写
- Replace / with - in dates 日期中/替换为-
Basic analysis 基本分析
- Find highest/lowest closing prices 查找最高/最低收盘价
- Calculate average closing price 计算平均收盘价
User input 用户输入
- Prompt for company code 提示输入公司代码
- Analyze data for specified company 分析指定公司的数据

LeoJellyfish

探索