IP-WARS.NET - a forward command post of the IP Wars
create account| Front Page|Mission|Standard Operating Procedures|Operating Instructions(aka FAQ's)|Privacy Policy|Site Stats/Info|Admin Actions|Search
Sections:General|IP|SCO v World |Microsoft|grok*/OSRM|IPW Site Meta|Logbooks|Diaries|Legal Documents|View All Articles

[Linux-ia64] optimizing __copy_user


SCO v The World

By nedu, Section Diary
Posted on Mon Jan 15th, 2007 at 08:59:04 EST

In SCO's Reply Memorandum in Support of its Motion for Summary Judgment on IBM's Sixth, Seventh and Eighth Counterclaims, Brent Hatch twice made a startling assertion:

  • “It is undisputed that SCO did not make any modifications to Linux.” (p.2 / p.6 in PDF)

  • “It is also undisputed that SCO did not modify Linux.” (p.6 / p.10 in PDF)

This issue has been previously explored. And a few minutes of recent searching finds this patch from Jun Nakajima.

Brent Hatch knows or should know the falsity of his assertion.

From: Jun Nakajima <jun_at_sco.com>
Date: 2000-10-19 08:59:04

Recently I found that __copy_user function takes a
relatively/signficantly long time under some stress tests, and it turned
out that those cases are happening when the source and destination have
more than 16 bytes and the alignment are different. In fact, this
problem is pointed out in copy_user.S:
 * Fixme:
 *	- handle the case where we have more than 16 bytes and the
alignment
 *	  are different.

Basically, it's copying byte by byte no matter how large the size is.

Following is a fix to that. I tested it using linux-2.4.0-test9.
-- 
Jun U Nakajima
Core OS Development
SCO/Murray Hill, NJ
Email: jun@sco.com, Phone: 908-790-2352 Fax: 908-790-2426
-----------------------------------------------------------------------
*** copy_user.S.org	Tue Oct 10 11:31:43 2000
--- copy_user.S Wed Oct 18 14:26:36 2000
***************
*** 51,57 ****
  // Tuneable parameters
  //
  #define COPY_BREAK	16	// we do byte copy below (must be >=16)
! #define PIPE_DEPTH	4	// pipe depth
  
  #define EPI		p[PIPE_DEPTH-1] // PASTE(p,16+PIPE_DEPTH-1)
  
--- 51,57 ----
  // Tuneable parameters
  //
  #define COPY_BREAK	16	// we do byte copy below (must be >=16)
! #define PIPE_DEPTH	6	// pipe depth
  
  #define EPI		p[PIPE_DEPTH-1] // PASTE(p,16+PIPE_DEPTH-1)
  
***************
*** 65,70 ****
--- 65,76 ----
  //
  // local registers
  //
+ #define t1		r2	// rshift in bytes
+ #define t2		r3	// lshift in bytes
+ #define rshift		r14	// right shift in bits
+ #define lshift		r15	// left shift in bits
+ #define word1 	r16
+ #define word2 	r17
  #define cnt		r18
  #define len2		r19
  #define saved_lc	r20
***************
*** 134,139 ****
--- 140,328 ----
	br.ret.sptk.few rp	// end of short memcpy
  
	//
+	// Not 8-byte alinged
+	//
+ diff_align_copy_user:
+	// At this point we know we have more than 16 bytes to copy
+	// and also that src and dest do _not_ have the same alignment.
+	and src2=0x7,src1				// src offset
+	and dst2=0x7,dst1				// dst offset
+	;;
+	// The basic idea is that we copy byte-by-byte at the head so 
+	// that we can reach 8-byte alignment for both src1 and dst1. 
+	// Then copy the body using software pipelined 8-byte copy, 
+	// shifting the two back-to-back words right and left, then copy 
+	// the tail by copying byte-by-byte.
+	//
+	// Fault handling. If the byte-by-byte at the head fails on the
+	// load, then restart and finish the pipleline by copying zeros
+	// to the dst1. Then copy zeros for the rest of dst1.
+	// If 8-byte software pipeline fails on the load, do the same as
+	// failure_in3 does. If the byte-by-byte at the tails fail, it
is
+	// handled simply by failure_in_pipe1.
+	//
+	// The case p14 represents the source has more bytes in the
+	// the first word (by the shifted part), whereas the p15 needs
to 
+	// copy some bytes from the 2nd word from the source that has
the 
+	// tail of the 1st of the destination.
+	//
+ 
+	//
+	// Optimization. If dst1 is 8-byte aligned (not rarely), we
don't need 
+	// to copy the head to dst1, to start 8-byte copy software
pipleline. 
+	// We know src1 is not 8-byte aligned in this case.
+	//
+	cmp.eq p14,p15=r0,dst2
+ (p15) br.cond.spnt.few 1f
+	;;
+	sub t1=8,src2
+	mov t2=src2
+	;;
+	shl rshift=t2,3
+	sub len1=len,t1 				// set len1
+	;;
+	sub lshift=64,rshift
+	;; 
+	br.cond.spnt.few word_copy_user
+	;; 
+ 1:
+	cmp.leu p14,p15=src2,dst2
+	sub t1=dst2,src2
+	;;
+	.pred.rel "mutex", p14, p15
+ (p14) sub word1=8,src2				// (8 - src
offset)
+ (p15) sub t1=r0,t1					// absolute
value
+ (p15) sub word1=8,dst2				// (8 - dst
offset)
+	;;
+	// For the case p14, we don't need to copy the shifted part to
+	// the 1st word of destination.
+	sub t2=8,t1
+ (p14) sub word1=word1,t1
+	;;
+	sub len1=len,word1				// resulting len
+	;;
+ (p15) shl rshift=t1,3 				// in bits
+ (p14) shl rshift=t2,3
+	;; 
+ (p14) sub len1=len1,t1
+	adds cnt=-1,word1
+	;; 
+	sub lshift=64,rshift
+	mov ar.ec=PIPE_DEPTH
+	mov pr.rot=1<<16	      // p16=true all others are false
 +	mov ar.lc=cnt
 +   
   ;; 
 + 2:
 +       EX(failure_in_pipe2,(p16) ld1 val1[0]=[src1],1)
 +       ;; 
 +	   
EX(failure_out,(EPI) st1 [dst1]=val1[PIPE_DEPTH-1],1)
 +       br.ctop.dptk.few 2b
 +	    ;;
 +
word_copy_user:
 +	 cmp.gtu p9,p0=16,len1
 + (p9)	br.cond.spnt.few 4f		// if
(16> len1) skip 8-byte
copy
+	;;
+	shr.u cnt=len1,3		// number of 64-bit words
+	;;
+	adds cnt=-1,cnt
+	;;
+	.pred.rel "mutex", p14, p15
+ (p14) sub src1=src1,t2
+ (p15) sub src1=src1,t1
+	//
+	// Now both src1 and dst1 point to an 8-byte alinged address.
And
+	// we have more than 8 bytes to copy.
+	//
+	mov ar.lc=cnt
+	mov ar.ec=PIPE_DEPTH
+	mov pr.rot=1<<16	      // p16=true all others are false
 +	;; 
 + 3:
 + #if
PIPE_DEPTH>= 5
+ #define EPI_1 	p[PIPE_DEPTH-2]
+ #define EPI_2 	p[PIPE_DEPTH-3] 
+	//
+	// The pipleline consists of 4 stages:
+	// 1 (p16):	Load a word from src1
+	// 2 (EPI_2):	Shift two back-to-back val1[] right and left
+	// 3 (EPI_1):	Or them, saving the result to tmp
+	// 4 (EPI):	Store tmp to dst1
+	//
+	// To make it simple, use at least 2 (p16) loops to set up
val1[n] 
+	// because we need 2 back-to-back val1[] to get tmp.
+	// Note that this implies EPI_2 must be p18 or greater.
+	// 
+ 
+	EX(failure_out,(EPI) st8 [dst1]=tmp,8)
+ (EPI_1)	or tmp=word1,word2
+ (EPI_2)	shr.u word1=val1[PIPE_DEPTH-3],rshift
+ (EPI_2)	shl word2=val1[PIPE_DEPTH-4],lshift
+	EX(failure_in2,(p16) ld8 val1[0]=[src1],8)
+	br.ctop.dptk.few 3b
+	;;			// RAW on src1 when fall through from
loop
+ #else
+	//
+	// The software pipeline above should be smoother because it
does
+	// not have any stop bits.
+	//
+	EX(failure_in2,(p16) ld8 val1[0]=[src1],8)
+ (EPI) shr.u word1=val1[PIPE_DEPTH-1],rshift
+ (EPI) shl word2=val1[PIPE_DEPTH-2],lshift
+	;; 
+ (EPI) or tmp=word1,word2
+	;; 
+	EX(failure_out,(EPI) st8 [dst1]=tmp,8)
+	;;
+	br.ctop.dptk.few 3b
+	;;			// RAW on src1 when fall through from
loop
+ #endif
+	.pred.rel "mutex", p14, p15
+ (p14) sub src1=src1,t1
+ (p14) adds dst1=-8,dst1
+ (p15) sub dst1=dst1,t1
+	;; 
+ 4:
+	// Tail correction.
+	//
+	// The problem with this piplelined loop is that the last word
is not
+	// loaded and thus parf of the last word written is not correct. 
+	// To fix that, we simply copy the tail byte by byte.
+	;;
+	sub len1=endsrc,src1,1
+	;;
+	mov ar.lc=len1
+	;;
+ 5:
+	EX(failure_in_pipe1,ld1 tmp=[src1],1)
+	;; 
+	EX(failure_out,st1 [dst1]=tmp,1)
+	br.cloop.dptk.few 5b
+	;; 
+ 
+ #if 0
+	// Fixme. Use the simple loop above in the meantime.
+	// If the code is followed by the software pipleline below,
+	// instead of the simple loop above, it sometimes does not work
+	// as expected.
+	// 
+	adds len1=1,len1
+	;; 
+	mov ar.ec=PIPE_DEPTH
+	mov pr.rot=1<<16	      // p16=true all others are false
 +	mov ar.lc=len1
 +  
    ;; 
 + 5:
 +       EX(failure_in_pipe1,(p16) ld1 val1[0]=[src1],1)
 + 
 +	   
EX(failure_out,(EPI) st1 [dst1]=val1[PIPE_DEPTH-1],1)
 +       br.ctop.dptk.few 5b
 +	    ;;
 +
#endif
 +	mov pr=saved_pr,0xffffffffffff0000
 +	    mov ar.pfs=saved_pfs
 +	 
br.ret.dptk.few rp
 + 
 +	//
	    // Beginning of long mempcy (i.e.> 16 bytes)
	//
  long_copy_user:
***************
*** 142,148 ****
	;;
	cmp.eq p10,p8=r0,tmp
	mov len1=len		// copy because of rotation
! (p8)	br.cond.dpnt.few 1b	// XXX Fixme. memcpy_diff_align 
	;;
	// At this point we know we have more than 16 bytes to copy
	// and also that both src and dest have the same alignment
--- 331,337 ----
	;;
	cmp.eq p10,p8=r0,tmp
	mov len1=len		// copy because of rotation
! (p8)	br.cond.dpnt.few diff_align_copy_user
	;;
	// At this point we know we have more than 16 bytes to copy
	// and also that both src and dest have the same alignment
***************
*** 267,272 ****
--- 456,476 ----
	mov ar.pfs=saved_pfs
	br.ret.dptk.few rp
  
+	//
+	// This is the case where the byte by byte copy fails on the
load
+	// when we copy the head. We need to finish the pipeline and
copy 
+	// zeros for the rest of the destination. Since this happens
+	// at the top we still need to fill the body and tail.
+ failure_in_pipe2:
+	sub ret0=endsrc,src1	// number of bytes to zero, i.e. not
copied
+ 2:
+ (p16) mov val1[0]=r0
+ (EPI) st1 [dst1]=val1[PIPE_DEPTH-1],1
+	br.ctop.dptk.few 2b
+	;;
+	sub len=enddst,dst1,1		// precompute len
+	br.cond.dptk.few failure_in1bis
+	;; 
  
	//
	// Here we handle the head & tail part when we check for
alignment.
***************
*** 395,400 ****
--- 599,621 ----
	mov ar.pfs=saved_pfs
	br.ret.dptk.few rp
  
+ failure_in2:
+	sub ret0=endsrc,src1	// number of bytes to zero, i.e. not
copied
+	;;
+ 3:
+ (p16) mov val1[0]=r0
+ (EPI) st8 [dst1]=val1[PIPE_DEPTH-1],8
+	br.ctop.dptk.few 3b
+	;;
+	cmp.ne p6,p0=dst1,enddst	// Do we need to finish the tail
?
+	sub len=enddst,dst1,1		// precompute len
+ (p6)	br.cond.dptk.few failure_in1bis
+	;;
+	mov pr=saved_pr,0xffffffffffff0000
+	mov ar.lc=saved_lc
+	mov ar.pfs=saved_pfs
+	br.ret.dptk.few rp
+ 
	//
	// handling of failures on stores: that's the easy part
	//
< FC 6 (de)buggery re DVDRip (and MPlayer) (0 comments) | Declaration of Barry Arndt (0 comments) >
Display: Sort:
[Linux-ia64] optimizing __copy_user | 12 comments (12 topical, 0 editorial, 0 hidden)
IBM's opposition (4.00 / 3) (#3)
by nedu (nedu@netscape.net) on Tue Jan 16th, 2007 at 00:07:20 EST
(User Info)

In IBM's Redacted Memorandum in Opposition to SCO's Motion for Summary Judgment on IBM's Sixth, Seventh and Eighth Counterclaims, underneath the heading “Material Facts Requiring Denial of SCO's Motion”, continuing on page 7 (p.15 in PDF), IBM states:

17. In developing SCO Linux Server 4.0, SCO also made certain modifications and additions to the Linux 2.4.19 kernel, such that SCO Linux Server 4.0 is a derivative work of the Linux 2.4.19 kernel within the meaning of the GPL. (IBM Copy. Br. ¶¶ 12, 35; Ex. 474 at SCO1170557-558; Ex. 617; Ex. 128 § 2(b).)

(md5: eb4f0dd592e66b36c45b24478b16a6a5 IBM-881.pdf)

Thus, IBM stated clearly that SCO modified Linux. This fact is well-known. Yet, replying directly to this particular memorandum, Brent O. Hatch (5715) asserted the opposite was ”undisputed.“

Mr. Hatch needs to explain this.



  • Exhibit 617 by nedu, 01/17/2007 17:02:51 EST (3.50 / 2)
Ralf Flaxa's Declaration (4.00 / 3) (#4)
by nedu (nedu@netscape.net) on Tue Jan 16th, 2007 at 10:44:25 EST
(User Info)

In the Declaration of Ralf Flaxa (25 Sep 2006), he declares under penalty of perjury:

1. I was employed at Caldera, Inc. ("Caldera") on a freelance basis from November 1995 until October 1997. I was then a full-time Caldera employee from November 1997 until June 2002. I served as a Director of Caldera's Linux development team in Erlangen, Germany.

(p.1 / p.3 in PDF)

Caldera's Contributions to Linux

22. Caldera employees made several important contributions to Linux in the course of their employment with the company.

23. For example, one of Caldera's key contributions to Linux included IPX. Several Caldera engineers are credited in the CREDITS file in the Linux kernel source with contributions, including Jim Freeman, Greg Page, and Ron Holt.

24. Caldera also played a key role in convincing partners to contribute to Linux. Largely as a result of these efforts, Caldera engineers and I have been recognized within the Linux CREDITS files.

(p.4 / p.6 in PDF)

(md5: 97ed125acd586649fb84b102a66bb301 IBM-774EX598.pdf)

Mr. Hatch asserted (p.7 / p.11 in PDF):

SCO did not in fact make any modifications or additions to Linux; rather, SCO redistributed Linux 2.4.19 as is on two disks that it received from SuSE. (SCO Ex. 233 ¶¶ 18-23.) IBM presents no evidence otherwise.

But Mr. Hatch's assertion fails to rebut Mr. Flaxa's declaration.



  • Ransom Love's Declaration by nedu, 01/18/2007 13:05:02 EST (4.00 / 2)
  • Douglas B. Beattie's Declaration by nedu, 01/27/2007 21:59:03 EST (4.00 / 2)
    • Re: Douglas B. Beattie's Declaration by br3n, 01/30/2007 06:40:11 EST (none / 1)
2.4.20-pre4 (4.00 / 2) (#8)
by nedu (nedu@netscape.net) on Fri Jan 19th, 2007 at 22:55:18 EST
(User Info)

Kerneltrap: JFS merged in 2.4

From: Marcelo Tosatti (marcelo@conectiva.com.br)
To: linux-kernel
Date: Mon Aug 19 2002 - 17:46:16 EST
Subject: Linux 2.4.20-pre4

So here goes -pre4, with JFS merged.

Also, if you got bootup lockups or some unexpected weird error try
-pre4 ;)

Summary of changes from v2.4.20-pre3 to v2.4.20-pre4
============================================

[...]


Christoph Hellwig :
o JFS: Initial import of version 1.0.18 for Linux 2.4

Dave Kleikamp :
o JFS: Fix structure alignment problem on 64-bit machines
o JFS: Add hch's copyright
o JFS: sanitize ->clear_inode, remove ->put-inode
o Fix races in JFS threads
o JFS: Yet another truncation fix
o JFS does not need to set i_version. It is never used
o JFS: fix fsync
o procfs entries should be created when CONFIG_JFS_STATISTICS is set
o JFS: set s_maxbytes to 1 byte lower
o Rework JFS's inode locking
o JFS: Dynamically allocate metapage structures
o Remove d_delete calls from jfs_rmdir & jfs_unlink
o JFS: Fix handling of commit_sem
o Add resize function to JFS
o fix typo in fs/jfs/resize.c
o JFS: Replace depreciated initializer syntax with C99 style
o JFS: Trivial fixes

[...]

From Steve Best's declaration (10 Sep 2005):

8. Christoph Hellwig was the leading non-IBM contributor to Linux JFS project. Mr. Hellwig was a Caldera employee at the time.

(p.3 / p.4 in PDF)



  • Re: 2.4.20-pre4 by nedu, 01/19/2007 18:23:58 EST (none / 1)
    • Arrrgh!!!!! by nedu, 01/20/2007 00:34:55 EST (none / 1)
Re: [Linux-ia64] optimizing __copy_user (3.50 / 2) (#5)
by ColonelZen (tzellers lieth within pobox of thy kingdom com) on Tue Jan 16th, 2007 at 20:58:58 EST
(User Info)

I've made the point in many posts, particularly the canonical, that the existance of SCOX code in the Linux kernel is what makes SCOX's Source License germane and inescapably a sublicense to the Linux kernel voiding their GPL rights.

Not being a lawyer I don't really know, but on the theory that legal==weasel, without that SCOX code they might escape on the grounds that there was no substance matter of overlap of the licenses, despite  their earlier representations.  IOW, an argument in the alternative that without that code, their license might be pure fraud but it would not otherwise be grounds for asserting a GPL violation.

Thanks for pointing this out and giving a quick reference to specific code.   I was aware of the old tlan drivers and I knew that Helwig had contributed code, but I never had a specific referrence I could cite in an argument.  This fills that gap nicely.

Of course it probably won't be long anymore before all argument about SCOX is moot.    A toast to that happy day!

-- TWZ

Re: [Linux-ia64] optimizing __copy_user (none / 1) (#1)
by nedu (nedu@netscape.net) on Mon Jan 15th, 2007 at 09:05:45 EST
(User Info)
$ md5sum IBM-930.pdf
92164dab306dce46fd5ce2c9cd9600a6  IBM-930.pdf


[Linux-ia64] Update: optimizing __copy_user (none / 1) (#2)
by nedu (nedu@netscape.net) on Mon Jan 15th, 2007 at 09:26:25 EST
(User Info)

And here's an updated patch which appears in the Linux IA64 Archives from Jun U Nakajima, Core OS Development, SCO/Murray Hill, NJ....

From: Jun Nakajima <jun_at_sco.com>
Date: 2000-10-24 07:04:47

I have updated __copy_user, reflecting the comments I got. Thanks for
those comments. The major changes are that it uses 'shrp' to make the
software pipeline shorter and more efficient (rahter than using 'shr.u',
'shl', and 'or'). 

-- 
Jun U Nakajima
Core OS Development
SCO/Murray Hill, NJ
Email: jun@sco.com, Phone: 908-790-2352 Fax: 908-790-2426

-----------------------------------------------------------------------
*** copy_user.S.org	Tue Oct 10 11:31:43 2000
--- copy_user.S Mon Oct 23 12:00:06 2000
***************
*** 65,70 ****
--- 65,76 ----
  //
  // local registers
  //
+ #define t1		r2	// rshift in bytes
+ #define t2		r3	// lshift in bytes
+ #define rshift		r14	// right shift in bits
+ #define lshift		r15	// left shift in bits
+ #define word1 	r16
+ #define word2 	r17
  #define cnt		r18
  #define len2		r19
  #define saved_lc	r20
***************
*** 134,139 ****
--- 140,329 ----
	br.ret.sptk.few rp	// end of short memcpy
  
	//
+	// Not 8-byte alinged
+	//
+ diff_align_copy_user:
+	// At this point we know we have more than 16 bytes to copy
+	// and also that src and dest do _not_ have the same alignment.
+	and src2=0x7,src1				// src offset
+	and dst2=0x7,dst1				// dst offset
+	;;
+	// The basic idea is that we copy byte-by-byte at the head so 
+	// that we can reach 8-byte alignment for both src1 and dst1. 
+	// Then copy the body using software pipelined 8-byte copy, 
+	// shifting the two back-to-back words right and left, then copy 
+	// the tail by copying byte-by-byte.
+	//
+	// Fault handling. If the byte-by-byte at the head fails on the
+	// load, then restart and finish the pipleline by copying zeros
+	// to the dst1. Then copy zeros for the rest of dst1.
+	// If 8-byte software pipeline fails on the load, do the same as
+	// failure_in3 does. If the byte-by-byte at the tail fails, it
is
+	// handled simply by failure_in_pipe1.
+	//
+	// The case p14 represents the source has more bytes in the
+	// the first word (by the shifted part), whereas the p15 needs
to 
+	// copy some bytes from the 2nd word of the source that has the 
+	// tail of the 1st of the destination.
+	//
+ 
+	//
+	// Optimization. If dst1 is 8-byte aligned (not rarely), we
don't need 
+	// to copy the head to dst1, to start 8-byte copy software
pipleline. 
+	// We know src1 is not 8-byte aligned in this case.
+	//
+	cmp.eq p14,p15=r0,dst2
+ (p15) br.cond.spnt.few 1f
+	;;
+	sub t1=8,src2
+	mov t2=src2
+	;;
+	shl rshift=t2,3
+	sub len1=len,t1 				// set len1
+	;;
+	sub lshift=64,rshift
+	;; 
+	br.cond.spnt.few word_copy_user
+	;; 
+ 1:
+	cmp.leu p14,p15=src2,dst2
+	sub t1=dst2,src2
+	;;
+	.pred.rel "mutex", p14, p15
+ (p14) sub word1=8,src2				// (8 - src
offset)
+ (p15) sub t1=r0,t1					// absolute
value
+ (p15) sub word1=8,dst2				// (8 - dst
offset)
+	;;
+	// For the case p14, we don't need to copy the shifted part to
+	// the 1st word of destination.
+	sub t2=8,t1
+ (p14) sub word1=word1,t1
+	;;
+	sub len1=len,word1				// resulting len
+ (p15) shl rshift=t1,3 				// in bits
+ (p14) shl rshift=t2,3
+	;; 
+ (p14) sub len1=len1,t1
+	adds cnt=-1,word1
+	;; 
+	sub lshift=64,rshift
+	mov ar.ec=PIPE_DEPTH
+	mov pr.rot=1<<16	      // p16=true all others are false
 +	mov ar.lc=cnt
 +   
   ;; 
 + 2:
 +       EX(failure_in_pipe2,(p16) ld1 val1[0]=[src1],1)
 +       ;; 
 +	   
EX(failure_out,(EPI) st1 [dst1]=val1[PIPE_DEPTH-1],1)
 +       br.ctop.dptk.few 2b
 +	    ;;
 +  
    clrrrb
 +	    ;; 
 + word_copy_user:
 +	    cmp.gtu p9,p0=16,len1
 + (p9)  br.cond.spnt.few
4f	       // if (16> len1) skip 8-byte
copy
+	;;
+	shr.u cnt=len1,3		// number of 64-bit words
+	;;
+	adds cnt=-1,cnt
+	;;
+	.pred.rel "mutex", p14, p15
+ (p14) sub src1=src1,t2
+ (p15) sub src1=src1,t1
+	//
+	// Now both src1 and dst1 point to an 8-byte alinged address.
And
+	// we have more than 8 bytes to copy.
+	//
+	mov ar.lc=cnt
+	mov ar.ec=PIPE_DEPTH
+	mov pr.rot=1<<16	      // p16=true all others are false
 +	;; 
 + 3:
 +	  
//
 +	    // The pipleline consists of 3 stages:
 +	    // 1 (p16):     Load a word from src1
+	// 2 (EPI_1):	Shift right pair, saving to tmp
 +	 // 3 (EPI):	 Store tmp to dst1
+	//
 +	    // To make it simple, use at least 2 (p16) loops to set up
 val1[n] 
 +	 
// because we need 2 back-to-back val1[] to get tmp.
 +       // Note that this implies EPI_2 must
be p18 or greater.
 +	    // 
 + 
 + #define EPI_1	     p[PIPE_DEPTH-2]
 + #define
SWITCH(pred, shift)   cmp.eq pred,p0=shift,rshift
 + #define CASE(pred, shift)	   \
 +      
(pred)	br.cond.spnt.few copy_user_bit##shift
 + #define BODY(rshift)				   
		  
 \
 + copy_user_bit##rshift:
 \
 + 1:					   
		       
 \
 +	    EX(failure_out,(EPI) st8 [dst1]=tmp,8);			   
 \
 + (EPI_1) shrp tmp=val1[PIPE_DEPTH-3],val1[PIPE_DEPTH-2],rshift;
 \
 +	
EX(failure_in2,(p16) ld8 val1[0]=[src1],8);		       
 \
 +	    br.ctop.dptk.few 1b;   
				       
 \
 +	    ;;						   
	       
 \
 +	    br.cond.spnt.few .diff_align_do_tail
 + 
 +       //
 +	  // Since
the instruction 'shrp' requires a fixed 128-bit value
 +       // specifying the bits to shift, we
need to provide 7 cases
 +	 // below. 
 +	     //
 +	 SWITCH(p6, 8)
 +	SWITCH(p7,
16)
 +	     SWITCH(p8, 24)
 +	     SWITCH(p9, 32)
 +	     SWITCH(p10, 40)
 +       SWITCH(p11,
48)
 +	     SWITCH(p12, 56)
 +       ;;
 +	  CASE(p6, 8)
 +       CASE(p7, 16)
 +	    
CASE(p8, 24)
 +       CASE(p9, 32)
 +	    CASE(p10, 40)
 +	   CASE(p11, 48)
 +	  CASE(p12,
56)
 +	     ;;
 +	 BODY(8)
 +	  BODY(16)
 +	    BODY(24)
 +       BODY(32)
 +      
BODY(40)
 +	  BODY(48)
 +	    BODY(56)
 +       ;; 
 + .diff_align_do_tail:
 +	  
.pred.rel "mutex", p14, p15
 + (p14) sub src1=src1,t1
 + (p14) adds dst1=-8,dst1
 + (p15) sub
dst1=dst1,t1
 +       ;; 
 + 4:
 +	 // Tail correction.
 +       //
 +	  // The problem
with this piplelined loop is that the last word
 is not
 +	 // loaded and thus parf of the
last word written is not correct. 
 +	    // To fix that, we simply copy the tail byte by byte.
+ 
 +	    sub len1=endsrc,src1,1
 +	    clrrrb
 +	    ;; 
 +	 mov ar.ec=PIPE_DEPTH
 +   
   mov pr.rot=1<<16	 // p16=true all others are false
 +	   mov ar.lc=len1
 +	  
;;
 + 5:
 +	  EX(failure_in_pipe1,(p16) ld1 val1[0]=[src1],1)
 + 
 +       EX(failure_out,(EPI)
st1 [dst1]=val1[PIPE_DEPTH-1],1)
 +	  br.ctop.dptk.few 5b
 +       ;;
 +	   mov
pr=saved_pr,0xffffffffffff0000
 +	mov ar.pfs=saved_pfs
 +       br.ret.dptk.few rp
 + 
 +    
  //
	      // Beginning of long mempcy (i.e.> 16 bytes)
	//
  long_copy_user:
***************
*** 142,148 ****
	;;
	cmp.eq p10,p8=r0,tmp
	mov len1=len		// copy because of rotation
! (p8)	br.cond.dpnt.few 1b	// XXX Fixme. memcpy_diff_align 
	;;
	// At this point we know we have more than 16 bytes to copy
	// and also that both src and dest have the same alignment
--- 332,338 ----
	;;
	cmp.eq p10,p8=r0,tmp
	mov len1=len		// copy because of rotation
! (p8)	br.cond.dpnt.few diff_align_copy_user
	;;
	// At this point we know we have more than 16 bytes to copy
	// and also that both src and dest have the same alignment
***************
*** 267,272 ****
--- 457,477 ----
	mov ar.pfs=saved_pfs
	br.ret.dptk.few rp
  
+	//
+	// This is the case where the byte by byte copy fails on the
load
+	// when we copy the head. We need to finish the pipeline and
copy 
+	// zeros for the rest of the destination. Since this happens
+	// at the top we still need to fill the body and tail.
+ failure_in_pipe2:
+	sub ret0=endsrc,src1	// number of bytes to zero, i.e. not
copied
+ 2:
+ (p16) mov val1[0]=r0
+ (EPI) st1 [dst1]=val1[PIPE_DEPTH-1],1
+	br.ctop.dptk.few 2b
+	;;
+	sub len=enddst,dst1,1		// precompute len
+	br.cond.dptk.few failure_in1bis
+	;; 
  
	//
	// Here we handle the head & tail part when we check for
alignment.
***************
*** 395,400 ****
--- 600,622 ----
	mov ar.pfs=saved_pfs
	br.ret.dptk.few rp
  
+ failure_in2:
+	sub ret0=endsrc,src1	// number of bytes to zero, i.e. not
copied
+	;;
+ 3:
+ (p16) mov val1[0]=r0
+ (EPI) st8 [dst1]=val1[PIPE_DEPTH-1],8
+	br.ctop.dptk.few 3b
+	;;
+	cmp.ne p6,p0=dst1,enddst	// Do we need to finish the tail
?
+	sub len=enddst,dst1,1		// precompute len
+ (p6)	br.cond.dptk.few failure_in1bis
+	;;
+	mov pr=saved_pr,0xffffffffffff0000
+	mov ar.lc=saved_lc
+	mov ar.pfs=saved_pfs
+	br.ret.dptk.few rp
+ 
	//
	// handling of failures on stores: that's the easy part
	//


[Linux-ia64] optimizing __copy_user | 12 comments (12 topical, 0 editorial, 0 hidden)
Display: Sort:

Links

Firefox 2

Use OpenOffice.org

Add to Technorati Favorites

Join EFF Today

ToTehMoon web site button

~ Merkey v The Internet et al Docs
~ Yahoeuvre
~ tuxrocks.com (SCO cases legal docs)
~ scofacts.org
~ eagle.petrofsky.org
~ Zen's Den
~ Yahoo SCOX Message Board
~ Lamlaw
~ Microsoft Watch
~ Groklaw
~ Korgwal - a Groklaw mirror
~ nosoftwarepatents.com
~ Flame Warriors
~ SCOXE Wars
~ Get your Merkey Number here!
~ Digital Law Online

Recent Comments

Breaking News and External Article Comments
General News – General Articles
by ColonelZen, January 5
60 comments
» SCO Lifeboat List from Stats_for_all – AncientBrit, May 6
» Not a single comment on the Novell... – sphealey, Jul 22
» Re: Not a single comment on the Novell... – AncientBrit, Aug 8

Eagle Loses Appeals
General News – General Articles
by JCausey, December 15
1 comment
» Re: Eagle Loses Appeals – br3n, Jan 7

The Chinese Room Revisited, Thoughts on...
General News – Diary
by ColonelZen, November 24
1 comment
» Re: The Chinese Room Revisited,... – ColonelZen, Nov 24

How to Transition a Windows Shop to Linux
General News – General Articles
by JCausey, November 21
3 comments
» Re: How to Transition a Windows Shop to... – ColonelZen, Nov 22
» Re: How to Transition a Windows Shop to... – JCausey, Nov 23
» Re: How to Transition a Windows Shop to... – ColonelZen, Nov 23

Advocacy
General News – Diary
by br3n, October 29
3 comments
» Re: Advocacy – br3n, Nov 2
» Re: Advocacy – ColonelZen, Nov 2
» Re: Advocacy – br3n, Nov 4

Very Bad News for Darl and Ralph
SCO v The World – Diary
by ColonelZen, October 13
7 comments
» Re: OT advocacy – br3n, Oct 26
» Re: OT advocacy – JCausey, Oct 28
» Re: OT advocacy – br3n, Oct 29

Some SCOX Financial Analysis
SCO v The World – SCO Related Articles
by JCausey, September 21
13 comments
» Re: Some SCOX Financial Analysis – br3n, Oct 3
» Re: Some SCOX Financial Analysis – ColonelZen, Oct 3
» Re: Some SCOX Financial Analysis – br3n, Oct 6

Open Source in Education - Opening Doors
General News – General Articles
by JCausey, September 28
1 comment
» Re: Open Source in Education - Opening... – br3n, Sep 29

An IPOWER ful experience
General News – Diary
by ColonelZen, September 25
6 comments
» IPOWER SysAdmin Doesn't Do Weekends!! – ColonelZen, Sep 29
» Re: An IPOWER ful experience – ColonelZen, Sep 29
» Re: An IPOWER ful experience – ColonelZen, Sep 29

Learning C#
Microsoft – Diary
by ColonelZen, September 23
1 comment
» Re: Learning C# – ColonelZen, Sep 23

Comment search...

Recent Diaries

SCO has a Potential and Credible BILLION Dollar Liability
by ColonelZen - March 15

The Chinese Room Revisited, Thoughts on Consciousness
by ColonelZen - November 24
1 comment


Advocacy
by br3n - October 29
3 comments


An IPOWER ful experience
by ColonelZen - September 25
6 comments


Learning C#
by ColonelZen - September 23
1 comment


Getting ruby DBI for Mysql and Postgresql working on FC 6
by ColonelZen - March 7

Declaration of Linus Torvalds
by nedu - February 13
1 comment


Declaration of M. Douglas McIlroy
by nedu - February 12
6 comments


Declaration of Ulrich Drepper
by nedu - February 11
1 comment


Declaration of K. Y. Srinivasan
by nedu - February 11


More Diaries...

Login

Make a new account

Username:
Password:

Older Stories

Monday May 28th
Why SCO Does Not Own the Unix Copyrights
   (0 comments)

Thursday April 5th
It Can Really Happen - Eagle Broadband Delisting from AMEX
   (5 comments)

Monday March 12th
OpenOffice.org Sends Open Letter to Dell
   (0 comments)

Tuesday March 6th
Preliminary Order in Prohibition
   (2 comments)

Monday January 15th
[Linux-ia64] optimizing __copy_user
   (12 comments)

Older Stories...

Related Links

~ SCO's Reply Memorandum in Support of its Motion for Summary Judgment on IBM's Sixth, Seventh and Eighth Counterclaims
~ explored
~ patch
~ nedu's Diary

SourceForge Logo Powered by Scoop

All trademarks and copyrights on this page are owned by their respective companies or owners.
Comments, articles and logbooks are owned by the Poster. By posting on the ip-wars.net web site, all posters grant a license to ip-wars.net to publish the content and release it pursuant to the Creative Commons License that covers the rest of the site. For more details, please check out the Standard Operating Procedures. Also, please read the Privacy Policy for the site. Finally, DO NOT send e-mail to the site owner (Jeff Causey) unless you have read and agree to the terms regarding e-mail included in the Standard Operating Procedures.
Everything else © 2004, 2005, 2006, 2007 ip-wars.net and Jeffrey G. Causey and is licensed under a
Creative Commons License
This work is licensed under a Creative Commons License.