How can I use sed to replace copyright/license headers in my source files?

Loading...

How can I use sed to replace copyright/license headers in my source files?

I need to replace the LGPL license header in all of my Java source files with the Apache License 2.0 header, i.e. this
/*
 * Copyright (c) 2012 Tyler Treat
 * 
 * This file is part of Project Foo.
 *
 * Project Foo is free software: you can redistribute it and/or modify
 * it under the terms of the GNU Lesser General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Project Foo is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU Lesser General Public License
 * along with Project Foo.  If not, see .
 */

needs to become
/*
 * Copyright (c) 2012 Tyler Treat
 * 
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 * 
 *  http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

I figured the easiest way would be to use sed to do a find and replace on all occurrences of this copyright header. I'm a bit of a Unix novice, so I was having problems getting the command working the way I needed it to -- specifically, dealing with the multiline strings. Basically, something like below, except the respective headers in place of foo and bar:
find . -name "*.java" -print | xargs sed -i 's/foo/bar/g'

I understand that sed works on one line at a time, so maybe there is a better solution altogether?

Solutions/Answers:

Answer 1:

find . -name "*.java" -print0 | xargs -0 \
sed -i -e '/Project Foo is free software/,/along with Project Foo/c\
 * Licensed under the Apache License, Version 2.0 (the "License");\
 * you may not use this file except in compliance with the License.\
 * You may obtain a copy of the License at\
 *\
 *  http://www.apache.org/licenses/LICENSE-2.0\
 *\
 * Unless required by applicable law or agreed to in writing, software\
 * distributed under the License is distributed on an "AS IS" BASIS,\
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\
 * See the License for the specific language governing permissions and\
 * limitations under the License.'

The c command changes the range of lines to the specified text. The range is identified by the line containing ‘Project Foo is free software’ up to the line containing ‘along with Project Foo’.
The -i option to sed indicates GNU sed; therefore, I’m assuming that you’ve GNU find and xargs too, and used -print0 and -0 to avoid issues with blanks in file names etc.

For this, I might be tempted to put the sed script into a file (sed.script), which could then be used with:

find . -name "*.java" -exec sed -i -f sed.script {} +

This is neater, I think, but beauty is in the eye of the beholder.


Just one question: the alignment is a little off on the asterisks, is there some sort of whitespace character I need to use to indent them? I tried adding spaces to the replacement string but that seemed to have no effect.

Grrr…that’s the sort of irritation I could do without (and you too). It seems that leading blanks on the ‘change’ data lines are dropped by sed. It seems to be sed rather than bash; I got the same result with ksh and also using a script file instead of the -e option on the command line. You can’t edit the ‘change’ data as it is output.

One trick that would work — but you may not be keen on it:

$ cat sed.script
/Project Foo is free software/,/along with Project Foo/c\
 * Licensed under the Apache License, Version 2.0 (the "License");\
 * you may not use this file except in compliance with the License.\
 * You may obtain a copy of the License at\
 *\
 *  http://www.apache.org/licenses/LICENSE-2.0\
 *\
 * Unless required by applicable law or agreed to in writing, software\
 * distributed under the License is distributed on an "AS IS" BASIS,\
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\
 * See the License for the specific language governing permissions and\
 * limitations under the License.
$ s2p -f sed.script > perl.script
$ find . -name "*.java" -exec perl -f perl.script -i.bak {} +
$

The s2p program is a standard part of the Perl distribution which converts sed scripts into Perl scripts, but it preserves the leading spaces in the substitute data. I’m not keen on this, but the only alternative I can think of is making two passes through each file. The replacement data might be:

$ cat sed.script
/Project Foo is free software/,/along with Project Foo/c\
@*@ Licensed under the Apache License, Version 2.0 (the "License");\
@*@ you may not use this file except in compliance with the License.\
@*@ You may obtain a copy of the License at\
@*@\
@*@  http://www.apache.org/licenses/LICENSE-2.0\
@*@\
@*@ Unless required by applicable law or agreed to in writing, software\
@*@ distributed under the License is distributed on an "AS IS" BASIS,\
@*@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\
@*@ See the License for the specific language governing permissions and\
@*@ limitations under the License.
$

After doing the main text replacement, you’d then do:

$ find . -name "*.java" -exec sed -i 's/^@\*@/ */' {} +
$

This tracks down the lines starting @*@ and replaces that text with ‘*‘ (blank-star). Not as neat and tidy, but you aren’t going to be doing this all that often, I trust.

Answer 2:

Partial License Replacement Using GNU Sed

You can use GNU sed to solve this with some regular expression line matches and a read expression. Here are the steps.

Use a File to Hold Replacement Text

First, create a file to hold the replacement portion of your license:

cat << EOF > /tmp/license
 * 
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 * 
 *  http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
EOF

Run the Actual Sed Invocation

Next, run find to collect your file list, and invoke the following sed script to make the changes:

find . -name '*.java' |
xargs sed -i'' '/Copyright.*Tyler Treat/,/\*\// {
                    /Copyright/n
                    /\*\//r /tmp/license
                    d
                }'

Compatibility Note

This solution may or may not work with other versions of sed, but was tested locally and known to work with GNU sed version 4.2.1. If it doesn’t work with the version of sed shipped with your edition of OS X, you can install GNU sed via MacPorts or similar.

Answer 3:

Assuming file1 contains your original text and file2 contains your replacement copyright comment:

awk 'f; /\*\//{system("cat file2");f=1}' file1

The above just looks for the first end-of-comment line in the original file and when it finds it cats the replacement file and turns on printing for the remainder of the original file.

References

Loading...