Speed-Copying Annotations from Scripta Sinica (A Tip for Mac Users)

The Scripta Sinica Database 漢籍電子文獻資料庫 is a treasure house for scholars of premodern China, especially those fortunate to have institutional subscription to the proprietary portions. I use this database regularly, but occasionally I run into a small technical problem: when copying texts from the database (in my case, always as plain text), what to do with the occasional interlinear annotations, like the following:

Scripta 1

Scripta 2

If I copy the text before expanding the annotations, I will have the main text but not the annotations. If I copy everything after expanding the annotations, I get a text that contains both the main text and the annotations jumbled together. Needless to say, this would be very confusing to read.

Ideally, I want to save a copy of the main text followed by all annotations at the bottom, like this:

○庚辰○金吾左衛百戶吳鎮奏抽太平安慶廬州淮楊常鎮等處商貨船稅奉旨南直沿江一帶往來船隻遺稅每年可得銀八萬兩有禆囯用著暨祿不妨原務帶管督率原奏官員吳鎮為首土民錢文明前去會撫按徵收觧進不許侵越欽関疆界重疊徵徵收困累商民載入廬州等府勑內鎮又條議五事一定疆界一定舡料則例一抽興販客米一抽木板枋柴炭一抽歲改叚絹布疋上命遵前旨行給錢文明官帶
南直:抱本直下有隸字。
欽關疆界:廣本抱本欽作鈔,是也。
重疊徵徵收:應刪一徵字。
舡料:廣本抱本舡作船。
官帶:廣本抱本官作冠,是也。

Up to now, I have always copy-pasted the annotations one by one. But recently, I became irritated by the time-consuming process. So I started asking: How can I copy all annotations at once and save my precious time?

Fortunately, it turns out that there is an easy trick for Mac users. (I’m sure similar solutions are available on Windows. If anyone is aware of one, please let me know.)

  1. After expanding the annotations, copy everything and paste it into TextEdit:
    Scripta 3
  2. Select one or more characters from any of the annotations:
    Scripta 4
  3. Go to Format>Font>Styles. You should see a window like the following. Click on “Select.”
    Scripta 5
  4. Check “Select by style” (uncheck everything else), and select “Select within entire document” (default setting). Now click on “Select.”
    Scripta 6
  5. All the annotations should be automatically selected. Copy and paste into your working file, and voilà, you have all the annotations. Happy copying!
    Scripta 7

Downloading Siku quanshu Text Files

I’m sure most of us who have worked with the Siku quanshu 四庫全書 database have dreamed of extracting texts of whole books without having to copy them page-by-page. It turns out that some books are indeed available as text files in here. The list is not complete, but it includes quit a few books from the History (史), Philosophy (子), and Literature (集) sections. Some books from the Classics (經) section are also available, but not that many.

If you are lucky enough to find the title of your interest, you might need to convert the encoding of the downloaded file before you can see the text properly. A tool that I have found handy is Encoding Master. Use it to open the file, and convert from DOS Chinese Simplified (GBK) to UTF-8. Now you have a clean text file of your favorite book!

Some points of caution:

  • The text is in simplified Chinese.
  • This comes from an online forum, so it can disappear anytime (although apparently it has been there for over a year now).
  • Not all files are from Siku quanshu. (Read the disclaimer in Paragraph #6 at the top of the page.) In any case, the files come from totally unknown sources, so they are as reliable as Wikipedia.
  • Depending on your understanding of the copyright of digitalized old texts, you might feel guilty using these files.

This is the most comprehensive list of clean Siku quanshu texts that I have seen so far. If anyone knows of a better source, I’d appreciate the information very much.