cpio.5 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391
  1. .\" Copyright (c) 2007 Tim Kientzle
  2. .\" All rights reserved.
  3. .\"
  4. .\" Redistribution and use in source and binary forms, with or without
  5. .\" modification, are permitted provided that the following conditions
  6. .\" are met:
  7. .\" 1. Redistributions of source code must retain the above copyright
  8. .\" notice, this list of conditions and the following disclaimer.
  9. .\" 2. Redistributions in binary form must reproduce the above copyright
  10. .\" notice, this list of conditions and the following disclaimer in the
  11. .\" documentation and/or other materials provided with the distribution.
  12. .\"
  13. .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  14. .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  15. .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  16. .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  17. .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  18. .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  19. .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  20. .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  21. .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  22. .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  23. .\" SUCH DAMAGE.
  24. .\"
  25. .\" $FreeBSD$
  26. .\"
  27. .Dd December 23, 2011
  28. .Dt CPIO 5
  29. .Os
  30. .Sh NAME
  31. .Nm cpio
  32. .Nd format of cpio archive files
  33. .Sh DESCRIPTION
  34. The
  35. .Nm
  36. archive format collects any number of files, directories, and other
  37. file system objects (symbolic links, device nodes, etc.) into a single
  38. stream of bytes.
  39. .Ss General Format
  40. Each file system object in a
  41. .Nm
  42. archive comprises a header record with basic numeric metadata
  43. followed by the full pathname of the entry and the file data.
  44. The header record stores a series of integer values that generally
  45. follow the fields in
  46. .Va struct stat .
  47. (See
  48. .Xr stat 2
  49. for details.)
  50. The variants differ primarily in how they store those integers
  51. (binary, octal, or hexadecimal).
  52. The header is followed by the pathname of the
  53. entry (the length of the pathname is stored in the header)
  54. and any file data.
  55. The end of the archive is indicated by a special record with
  56. the pathname
  57. .Dq TRAILER!!! .
  58. .Ss PWB format
  59. The PWB binary
  60. .Nm
  61. format is the original format, when cpio was introduced as part of the
  62. Programmer's Work Bench system, a variant of 6th Edition UNIX. It
  63. stores numbers as 2-byte and 4-byte binary values.
  64. Each entry begins with a header in the following format:
  65. .Pp
  66. .Bd -literal -offset indent
  67. struct header_pwb_cpio {
  68. short h_magic;
  69. short h_dev;
  70. short h_ino;
  71. short h_mode;
  72. short h_uid;
  73. short h_gid;
  74. short h_nlink;
  75. short h_majmin;
  76. long h_mtime;
  77. short h_namesize;
  78. long h_filesize;
  79. };
  80. .Ed
  81. .Pp
  82. The
  83. .Va short
  84. fields here are 16-bit integer values, while the
  85. .Va long
  86. fields are 32 bit integers. Since PWB UNIX, like the 6th Edition UNIX
  87. it was based on, only ran on PDP-11 computers, they
  88. are in PDP-endian format, which has little-endian shorts, and
  89. big-endian longs. That is, the long integer whose hexadecimal
  90. representation is 0x12345678 would be stored in four successive bytes
  91. as 0x34, 0x12, 0x78, 0x56.
  92. The fields are as follows:
  93. .Bl -tag -width indent
  94. .It Va h_magic
  95. The integer value octal 070707.
  96. .It Va h_dev , Va h_ino
  97. The device and inode numbers from the disk.
  98. These are used by programs that read
  99. .Nm
  100. archives to determine when two entries refer to the same file.
  101. Programs that synthesize
  102. .Nm
  103. archives should be careful to set these to distinct values for each entry.
  104. .It Va h_mode
  105. The mode specifies both the regular permissions and the file type, and
  106. it also holds a couple of bits that are irrelevant to the cpio format,
  107. because the field is actually a raw copy of the mode field in the inode
  108. representing the file. These are the IALLOC flag, which shows that
  109. the inode entry is in use, and the ILARG flag, which shows that the
  110. file it represents is large enough to have indirect blocks pointers in
  111. the inode.
  112. The mode is decoded as follows:
  113. .Pp
  114. .Bl -tag -width "MMMMMMM" -compact
  115. .It 0100000
  116. IALLOC flag - irrelevant to cpio.
  117. .It 0060000
  118. This masks the file type bits.
  119. .It 0040000
  120. File type value for directories.
  121. .It 0020000
  122. File type value for character special devices.
  123. .It 0060000
  124. File type value for block special devices.
  125. .It 0010000
  126. ILARG flag - irrelevant to cpio.
  127. .It 0004000
  128. SUID bit.
  129. .It 0002000
  130. SGID bit.
  131. .It 0001000
  132. Sticky bit.
  133. .It 0000777
  134. The lower 9 bits specify read/write/execute permissions
  135. for world, group, and user following standard POSIX conventions.
  136. .El
  137. .It Va h_uid , Va h_gid
  138. The numeric user id and group id of the owner.
  139. .It Va h_nlink
  140. The number of links to this file.
  141. Directories always have a value of at least two here.
  142. Note that hardlinked files include file data with every copy in the archive.
  143. .It Va h_majmin
  144. For block special and character special entries,
  145. this field contains the associated device number, with the major
  146. number in the high byte, and the minor number in the low byte.
  147. For all other entry types, it should be set to zero by writers
  148. and ignored by readers.
  149. .It Va h_mtime
  150. Modification time of the file, indicated as the number
  151. of seconds since the start of the epoch,
  152. 00:00:00 UTC January 1, 1970.
  153. .It Va h_namesize
  154. The number of bytes in the pathname that follows the header.
  155. This count includes the trailing NUL byte.
  156. .It Va h_filesize
  157. The size of the file. Note that this archive format is limited to 16
  158. megabyte file sizes, because PWB UNIX, like 6th Edition, only used
  159. an unsigned 24 bit integer for the file size internally.
  160. .El
  161. .Pp
  162. The pathname immediately follows the fixed header.
  163. If
  164. .Cm h_namesize
  165. is odd, an additional NUL byte is added after the pathname.
  166. The file data is then appended, again with an additional NUL
  167. appended if needed to get the next header at an even offset.
  168. .Pp
  169. Hardlinked files are not given special treatment;
  170. the full file contents are included with each copy of the
  171. file.
  172. .Ss New Binary Format
  173. The new binary
  174. .Nm
  175. format showed up when cpio was adopted into late 7th Edition UNIX.
  176. It is exactly like the PWB binary format, described above, except for
  177. three changes:
  178. .Pp
  179. First, UNIX now ran on more than one hardware type, so the endianness
  180. of 16 bit integers must be determined by observing the magic number at
  181. the start of the header. The 32 bit integers are still always stored
  182. with the most significant word first, though, so each of those two, in
  183. the struct shown above, was stored as an array of two 16 bit integers,
  184. in the traditional order. Those 16 bit integers, like all the others
  185. in the struct, were accessed using a macro that byte swapped them if
  186. necessary.
  187. .Pp
  188. Next, 7th Edition had more file types to store, and the IALLOC and ILARG
  189. flag bits were re-purposed to accommodate these. The revised use of the
  190. various bits is as follows:
  191. .Pp
  192. .Bl -tag -width "MMMMMMM" -compact
  193. .It 0170000
  194. This masks the file type bits.
  195. .It 0140000
  196. File type value for sockets.
  197. .It 0120000
  198. File type value for symbolic links.
  199. For symbolic links, the link body is stored as file data.
  200. .It 0100000
  201. File type value for regular files.
  202. .It 0060000
  203. File type value for block special devices.
  204. .It 0040000
  205. File type value for directories.
  206. .It 0020000
  207. File type value for character special devices.
  208. .It 0010000
  209. File type value for named pipes or FIFOs.
  210. .It 0004000
  211. SUID bit.
  212. .It 0002000
  213. SGID bit.
  214. .It 0001000
  215. Sticky bit.
  216. .It 0000777
  217. The lower 9 bits specify read/write/execute permissions
  218. for world, group, and user following standard POSIX conventions.
  219. .El
  220. .Pp
  221. Finally, the file size field now represents a signed 32 bit integer in
  222. the underlying file system, so the maximum file size has increased to
  223. 2 gigabytes.
  224. .Pp
  225. Note that there is no obvious way to tell which of the two binary
  226. formats an archive uses, other than to see which one makes more
  227. sense. The typical error scenario is that a PWB format archive
  228. unpacked as if it were in the new format will create named sockets
  229. instead of directories, and then fail to unpack files that should
  230. go in those directories. Running
  231. .Va bsdcpio -itv
  232. on an unknown archive will make it obvious which it is: if it's
  233. PWB format, directories will be listed with an 's' instead of
  234. a 'd' as the first character of the mode string, and the larger
  235. files will have a '?' in that position.
  236. .Ss Portable ASCII Format
  237. .St -susv2
  238. standardized an ASCII variant that is portable across all
  239. platforms.
  240. It is commonly known as the
  241. .Dq old character
  242. format or as the
  243. .Dq odc
  244. format.
  245. It stores the same numeric fields as the old binary format, but
  246. represents them as 6-character or 11-character octal values.
  247. .Pp
  248. .Bd -literal -offset indent
  249. struct cpio_odc_header {
  250. char c_magic[6];
  251. char c_dev[6];
  252. char c_ino[6];
  253. char c_mode[6];
  254. char c_uid[6];
  255. char c_gid[6];
  256. char c_nlink[6];
  257. char c_rdev[6];
  258. char c_mtime[11];
  259. char c_namesize[6];
  260. char c_filesize[11];
  261. };
  262. .Ed
  263. .Pp
  264. The fields are identical to those in the new binary format.
  265. The name and file body follow the fixed header.
  266. Unlike the binary formats, there is no additional padding
  267. after the pathname or file contents.
  268. If the files being archived are themselves entirely ASCII, then
  269. the resulting archive will be entirely ASCII, except for the
  270. NUL byte that terminates the name field.
  271. .Ss New ASCII Format
  272. The "new" ASCII format uses 8-byte hexadecimal fields for
  273. all numbers and separates device numbers into separate fields
  274. for major and minor numbers.
  275. .Pp
  276. .Bd -literal -offset indent
  277. struct cpio_newc_header {
  278. char c_magic[6];
  279. char c_ino[8];
  280. char c_mode[8];
  281. char c_uid[8];
  282. char c_gid[8];
  283. char c_nlink[8];
  284. char c_mtime[8];
  285. char c_filesize[8];
  286. char c_devmajor[8];
  287. char c_devminor[8];
  288. char c_rdevmajor[8];
  289. char c_rdevminor[8];
  290. char c_namesize[8];
  291. char c_check[8];
  292. };
  293. .Ed
  294. .Pp
  295. Except as specified below, the fields here match those specified
  296. for the new binary format above.
  297. .Bl -tag -width indent
  298. .It Va magic
  299. The string
  300. .Dq 070701 .
  301. .It Va check
  302. This field is always set to zero by writers and ignored by readers.
  303. See the next section for more details.
  304. .El
  305. .Pp
  306. The pathname is followed by NUL bytes so that the total size
  307. of the fixed header plus pathname is a multiple of four.
  308. Likewise, the file data is padded to a multiple of four bytes.
  309. Note that this format supports only 4 gigabyte files (unlike the
  310. older ASCII format, which supports 8 gigabyte files).
  311. .Pp
  312. In this format, hardlinked files are handled by setting the
  313. filesize to zero for each entry except the first one that
  314. appears in the archive.
  315. .Ss New CRC Format
  316. The CRC format is identical to the new ASCII format described
  317. in the previous section except that the magic field is set
  318. to
  319. .Dq 070702
  320. and the
  321. .Va check
  322. field is set to the sum of all bytes in the file data.
  323. This sum is computed treating all bytes as unsigned values
  324. and using unsigned arithmetic.
  325. Only the least-significant 32 bits of the sum are stored.
  326. .Ss HP variants
  327. The
  328. .Nm cpio
  329. implementation distributed with HPUX used XXXX but stored
  330. device numbers differently XXX.
  331. .Ss Other Extensions and Variants
  332. Sun Solaris uses additional file types to store extended file
  333. data, including ACLs and extended attributes, as special
  334. entries in cpio archives.
  335. .Pp
  336. XXX Others? XXX
  337. .Sh SEE ALSO
  338. .Xr cpio 1 ,
  339. .Xr tar 5
  340. .Sh STANDARDS
  341. The
  342. .Nm cpio
  343. utility is no longer a part of POSIX or the Single Unix Standard.
  344. It last appeared in
  345. .St -susv2 .
  346. It has been supplanted in subsequent standards by
  347. .Xr pax 1 .
  348. The portable ASCII format is currently part of the specification for the
  349. .Xr pax 1
  350. utility.
  351. .Sh HISTORY
  352. The original cpio utility was written by Dick Haight
  353. while working in AT&T's Unix Support Group.
  354. It appeared in 1977 as part of PWB/UNIX 1.0, the
  355. .Dq Programmer's Work Bench
  356. derived from
  357. .At v6
  358. that was used internally at AT&T.
  359. Both the new binary and old character formats were in use
  360. by 1980, according to the System III source released
  361. by SCO under their
  362. .Dq Ancient Unix
  363. license.
  364. The character format was adopted as part of
  365. .St -p1003.1-88 .
  366. XXX when did "newc" appear? Who invented it? When did HP come out with their variant? When did Sun introduce ACLs and extended attributes? XXX
  367. .Sh BUGS
  368. The
  369. .Dq CRC
  370. format is mis-named, as it uses a simple checksum and
  371. not a cyclic redundancy check.
  372. .Pp
  373. The binary formats are limited to 16 bits for user id, group id,
  374. device, and inode numbers. They are limited to 16 megabyte and 2
  375. gigabyte file sizes for the older and newer variants, respectively.
  376. .Pp
  377. The old ASCII format is limited to 18 bits for
  378. the user id, group id, device, and inode numbers.
  379. It is limited to 8 gigabyte file sizes.
  380. .Pp
  381. The new ASCII format is limited to 4 gigabyte file sizes.
  382. .Pp
  383. None of the cpio formats store user or group names,
  384. which are essential when moving files between systems with
  385. dissimilar user or group numbering.
  386. .Pp
  387. Especially when writing older cpio variants, it may be necessary
  388. to map actual device/inode values to synthesized values that
  389. fit the available fields.
  390. With very large filesystems, this may be necessary even for
  391. the newer formats.