Little tool to extract all files out of Sharepoint

November 2, 2006

We’ve been using Microsoft Sharepoint here at Plaxo to organize our projects and files. For those who have used Sharepoint, you know it’s a love/hate relationship 🙂

Anyways, we needed to get a copy of all the files in our Sharepoint site recently. Looking around on the web, we found a few solutions, but couldn’t really find one that did exactly what we wanted. So, I wrote up a quick little app to grab all the files we needed out of Sharepoint, while preserving Sharepoint’s directory structure. It’s a command line tool and it doesn’t have any options.

It’s really not that hard; here’s the source in case you’re interested:

using System;
using System.Collections.Generic;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.IO;

// replace this string with your Sharepoint content DB connection string
string DBConnString = “Server=YOURSHAREPOINTSERVER;Database=CONTENTDATABASE;Trusted_Connection=True;”;

// create a DB connection
SqlConnection con = new SqlConnection(DBConnString);
con.Open();

// the query to grab all the files.
// Note: Feel free to alter the LeafName like ‘%.extension’ arguments to suit your purpose
SqlCommand com = con.CreateCommand();
com.CommandText = “select DirName, LeafName, Content from Docs where (LeafName like ‘%.doc’ or LeafName like ‘%.xls’ or LeafName like ‘%.pdf’ or LeafName like ‘%.ppt’) and Content is not NULL”;

// execute query
SqlDataReader reader = com.ExecuteReader();

while (reader.Read())
{
    // grab the file’s directory and name
    string DirName = (string)reader[“DirName”];
    string LeafName = (string)reader[“LeafName”];

    // create directory for the file if it doesn’t yet exist
    if (!Directory.Exists(DirName))
    {
        Directory.CreateDirectory(DirName);
        Console.WriteLine(“Creating directory: “ + DirName);
    }

    // create a filestream to spit out the file
    FileStream fs = new FileStream(DirName + “/” + LeafName, FileMode.Create, FileAccess.Write);
    BinaryWriter writer = new BinaryWriter(fs);

    // depending on the speed of your network, you may want to change the buffer size (it’s in bytes)
    int bufferSize = 1000000;
    long startIndex = 0;
    long retval = 0;
    byte[] outByte = new byte[bufferSize];

    // grab the file out of the db one chunk (of size bufferSize) at a time
    do
    {
        retval = reader.GetBytes(2, startIndex, outByte, 0, bufferSize);
        startIndex += bufferSize;

        writer.Write(outByte, 0, (int)retval);
        writer.Flush();
    } while (retval == bufferSize);

    // finish writing the file
    writer.Close();
    fs.Close();

    Console.WriteLine(“Finished writing file: “ + LeafName);
}

// close the DB connection and whatnots
reader.Close();
con.Close();

Here’s my VS.NET 2005 project file. If you need help with the code, please let me know.

Posted by markjen
Filed in Microsoft, Plaxo, technology, work

45 Comments »

45 Responses to “Little tool to extract all files out of Sharepoint”

rose Says:

November 3, 2006 at 12:13 pm
hmm, i think you could have just used sharepoint in the folder view, then dragged everything onto your desktop 🙂

ah, how i wished we used sharepoint here at work…

Reply
adam herscher Says:

November 7, 2006 at 10:45 am
out of curiosity, why did you need to do this?

Reply
Chupes Says:

March 12, 2007 at 6:31 am
hello. pardon my newbieness here, but is this a sql script?o what other language?

Reply
markjen Says:

March 23, 2007 at 9:25 am
Hi Chupes,

Yup, this is a simple SQL query that extracts all the files in sharepoint and writes them to the file system.

It’s written in C#, but it’s .NET, so you could easily convert it to VB.NET, C++.NET, etc.

Reply
m Says:

April 26, 2007 at 2:48 pm
you rock

Reply
Steven Borg Says:

July 10, 2007 at 7:26 am
Nice work!!! Yes, the theory is that you can just drag and drop from a folder view, however, that sucks when you’re trying to automate the retrieval! Or, like in our case, where the entire WSS 2.0 site got completely hosed during the upgrade to WSS 3.0 and we could see all the files, but not download them. This script was a lifesaver!

Reply
Eric Says:

July 31, 2007 at 7:57 am
Mark, When I parse or execute this script in the Query Analyzer I get the following error, and I cannot pinpoint the problem. Any suggestions? Because I too need to pull documents from a wss2 database that is not in use anymore.

[Microsoft][ODBC SQL Server Driver]Syntax error or access violation

Reply
Eric Says:

July 31, 2007 at 12:59 pm
Ignore the message above. I now have the code and your C# project file loaded in vs.net. Running the debug led me to find an error that states the Content column dosen’t exist. This database is the Moss 07 beta, so I am not sure if that is the issue. Since the code just checks to see if it is null, I took it out.

Now I think I am so close becasue it will actually create a directory structure, but errors on pulling the file over in “chunks”. The debug says the “Index was outside the bounds of the array” and it is point to the outByte portion of this statement:

retval = reader.GetBytes(2, startIndex, outByte, 0, bufferSize)

I’m still looking for the answer myself and I am hoping I can get this to work. Any help would be great, thanks.

Reply
Eric Says:

August 7, 2007 at 11:22 am
I have modified the code to work with SharePoint 07. Since the Content is stored in a new table with 07, the query piece needs to be a little different. Just replace line 24 with the sting below. It worked perfectly for me after that.

Find
com.CommandText = “select DirName, LeafName, Content from Docs where (LeafName like ‘%.doc’ or LeafName like ‘%.xls’ or LeafName like ‘%.pdf’ or LeafName like ‘%.ppt’) and Content is not NULL”;

Replace With
com.CommandText = "select [AllDocs].[DirName], [AllDocs].[LeafName], [AllDocStreams].[Content] from [AllDocs],[AllDocStreams] where (LeafName like '%.doc' or LeafName like '%.xls' or LeafName like '%.pdf' or LeafName like '%.ppt') and [AllDocStreams].[Content] is not NULL and [AllDocs].[Id] = [AllDocStreams].[Id]";

Reply
mINDAUGAS Says:

November 3, 2007 at 1:54 pm
Hi, is there any way to use your tool for default Windows Internal Database msde i’m trying to connect to database seting Serter=fs\microsoft##ssee with no sucess. fs -is the same host there installated sharepoint service and internal database .
thanks.

Reply
bud Says:

November 16, 2007 at 10:30 am
you da man!sharepoint config db corrupted but was able to recover our files with your script!thank you! thank you!

Reply
JeffD Says:

November 19, 2007 at 1:07 pm
Thanks!!! The WSS 2>3 converter left my database (with over 2 gigs of files) hosed. Your script got me back an intact file system full of my files. Don’t know how to say thanks enough.

Reply
wbrproductions Says:

November 30, 2007 at 8:44 am
markjen and #9 are the shiznizkydoodle! Thanks, folks, you just saved my butt. That worked like a goddam lucky charm. 2 weeks of frustration dissolved by a 15 second script.

Reply
Alex Guzun Says:

December 7, 2007 at 4:31 am
My MOSS2007 just gone to hell… got you app. Fantastic job…
Is it possible to get lists? We had a big amount of data in lists 😦
To get wiki, i think it is enough to add *.html

Thank you very much in advance.

Reply
Alex Guzun Says:

December 7, 2007 at 6:40 am
Another problem 😦 It can’t created looong directories. Program stops when tried to create dir witch full path contain 194 symbols 😦 Is it possible to made something (at least) to pass that dir, not to fail?

Reply
Alex Guzun Says:

December 8, 2007 at 5:57 am
Any solution to skip ‘long directory’?

Unhandled Exception: System.IO.PathTooLongException: The specified path, file name, or both are too long. The fully qual
ified file name must be less than 260 characters, and the directory name must be less than 248 characters.
at System.IO.Path.NormalizePathFast(String path, Boolean fullCheck)
at System.IO.Path.GetFullPathInternal(String path)
at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShar
e share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
at DoxoEater.Program.Main(String[] args) in c:\!!!\DoxoEater\DoxoEater\Program.cs:line 44

Reply
markjen Says:

December 16, 2007 at 8:47 am
Hi Alex, you could run a quick check on the path size before line 44 in program.cs. Not sure what you’d do with the files that were supposed to be in the long-name directory though. Perhaps you could write a small function that would shorten the directory name?

Reply
Happy Says:

January 8, 2008 at 6:47 am
Thanks a million. You program saved our documents.
I have used several days to try to get Moss to do this. Now I can delete all and make a new installment.

thanks
🙂

Reply
Ryan Says:

January 24, 2008 at 12:55 pm
How can I get the latest version of this code? I have been reading through this and there seems like some bugs have been fixed.

What would make this perfect for me is if there was logging that confirms each file did or did not get written to a drive.

Does anyone know of a program similar to this that I can purchase?

Reply
ryan Says:

January 26, 2008 at 10:33 pm
It’s me again. I have been using this script and it has worked very well.

Does anyone know if there is a simular script to extract List data?

Reply
joe Says:

February 23, 2008 at 9:01 am
Is there any code out there to unload/extract calendar data in a script for some automation purposes I have?

Reply
Wordpress Says:

April 4, 2008 at 11:34 pm
developmentchinese jacketcompetition Chinesedressdevelopment capitalsilk clothesStreet grilchinese dresscultural womancheongsamhistory ÓÍÕôÆøÕæ¿Õ±ÃÓ°Ïì 0.8833553045874069

Reply
Dwain Says:

April 9, 2008 at 12:25 pm
I thought I’d pile on and say THANK YOU – this is a great little utility. I added an args parameter so I could get the documents out of any specific site, but your code was a great start. Thanks!

Reply
abhijeet Says:

June 29, 2008 at 10:20 am
Getting errors on pulling the file over in “chunks”. The debug says the “Index was outside the bounds of the array” and it is point to the outByte portion of this statement:

retval = reader.GetBytes(2, startIndex, outByte, 0, bufferSize)

earlier eric posted the same problem but here i haven’t see any comment about that, i f any one have idea about this please help me..

I’m still looking for the answer myself and I am hoping I can get this to work. Any help would be great, thanks.

Reply
Paul Says:

July 7, 2008 at 7:38 am
This is a great script, but you will notice that this does not maintain version information. In order to do so you need the following commands

select [DirName], [LeafName], [Content], [UIVersion] as Version from [Docs] where [Content] is not NULL

select [AllDocs].[DirName], [AllDocs].[LeafName], [AllDocStreams].[Content],[AllDocs].[UIVersionString] as Version from [AllDocs],[AllDocStreams] where [AllDocStreams].[Content] is not NULL and [AllDocs].[Id] = [AllDocStreams].[Id]

Add the following to the reader.Read loop

string Version = (string)reader[“Version”].ToString();

//Add this below the reader assignment of leafName
string leafName= “[” + Version.ToString() + “]” + LeafName;

I think that this will address all scenarios where a file is over written. As a precaution I have also added some logging in my script to check and see if the exists before I extract it. I recommend that you do the same.

Reply
Wes Says:

August 25, 2008 at 6:42 pm
Thanks, works very well save one issue for me. I can dump doc, xls, etc but we had a folder of images (.jpg) that when I attempt to dump throws “Path cannot be the empty string or all whitespace” at Directory.CreateDirectory(DirName). I’m a complete code newb, is there anyway to add a modifier to the directory name so that it cannot be empty or something like that? Thanks!

Reply
Exporting site content from a SharePoint Content Database for recovery purposes. « Krichie - That SharePoint Guy Says:

August 26, 2008 at 6:04 am
[…] Filed under: SharePoint — Keith Richie @ 11:03 pm My good friend Todd Klindt pointed me to this posting written by Mark Jen which showed some sample code to export site content from a SharePoint Content […]

Reply
Ivan Morrison Says:

September 4, 2008 at 6:02 am
Hi there!

Obviously, I am missing somehting here, I cant get the script to run, I tried putting it in SQL query editor, get a WACK of errors with syntax, so that cant be it, I tried compliling with CSC, again no go, so what is it that I am missing?

Thanks in advance!

Reply
Ivan Morrison Says:

September 4, 2008 at 7:56 am
I got it to run!! Since I wasnt to up on my C# application building, the last link (from Keith) have the project structure, and with some copy pasta, I was able to complile! (“‘s needed to be converted to “‘s )

Thanks, you saved our “bacon” !!!

Reply
frank Says:

September 9, 2008 at 2:43 pm
Thanks for the information, i,m new here looking for more info on share poit and shere file! Thanks again!

Reply
roger Says:

October 9, 2008 at 3:28 am
Excellent !

Excactly what we needed after our sharepointserver tanked…. 🙂

thanks a bunch

Reply
Nathan B Says:

November 21, 2008 at 10:49 am
I’m a bit loathe to admit I had to use this utility, but sometimes disasters happen.

Onto my answer:

People were having issues with the “Index was Outside the bounds of the array.” of the GetBytes command. Googling answers for this told me that the GetBytes command gets the bytes of a value in the array.

Since I used PLAXOED! and KRichie’s blog, I had an amalgamation of the two authors’ code.
It turns out the GetBytes command’s first parameter corresponds to which index of an array to get the bytes from. In this case, the array is a row of our SQL query at the beginning–
Because I had ‘4’ from Krichie’s code, I was trying to get the value of non-existent column in the array from the PLaxoed! code.

Changing this number to 2 (the index corresponding to the Content column) fixed it for me.

Great code, thank goodness!

Reply
Dave Says:

November 26, 2008 at 5:51 am
This was quite useful.
I used this SQL instead to get all files.

“SELECT DirName, LeafName, Content FROM Docs WHERE Type = 0 and Content is not NULL”;

Reply
brian Hman Says:

January 21, 2009 at 9:51 am
You know I don’t usually leave comments on code that I find online, but this is VERY VERY useful. Thank you so much for your post!

Brian Hman

Reply
Robert A Says:

April 22, 2009 at 1:20 pm
Hello! I’m going to ask a stupid question, how do I run this script?? Do I need Visual Studio and compile it and run it as an executable? Could someone please explain this step by step. Really need to do this on my damaged Sharepoint site. Thanks in advance!

Reply
Murat Sözen Says:

July 1, 2009 at 9:32 am
Many thanks for the guidance and clear instructions.

When needed, this is a life saver. Your article wll deserves all the appreciation. Wishing you the best, thanks again.

Reply
Edwill Says:

October 30, 2009 at 12:02 pm
Thank you, I found myself in the perfect situation for this…

Reply
Abhijeet Khake Says:

March 10, 2010 at 8:18 am
Hi,
How can write sql query to get documents for single user,
like select * from docs where user=’xyz’

how can i join users to docs table and retrieve documents user by user.

Thanks
Abhijeet Khake

Reply
Mateusz Says:

April 16, 2010 at 1:25 pm
It worked for me with WSS 3.0 and SharePoint server 2007 (after changing the SQL command of course).

Thanks a lot!

Reply
Jakob Øjvind Nielsen Says:

April 23, 2010 at 12:01 pm
thanks for the codesnippet. It help me to extract all the documents, on a wss where AD integration was lost.
On wss3 you should use the sql statement

select d.dirName, d.LeafName, s.content from alldocs d join alldocstreams s on d.id = s.id where Content is not NULL

instead.

Thanks
Jakob

Reply
Mateusz Says:

April 28, 2010 at 11:19 am
I just had PathTooLongException. I handled it with this code. Maybe someone my need it too:

if (DirName.Length > 247)
{
Console.WriteLine(“Foler name: ” + DirName + ” is too long\nChanged folder name to: TOO LONG”);
DirName = “TOO LONG”;
}

if (DirName.Length + LeafName.Length > 258)
{
Console.WriteLine(“Path name: ” + DirName + “/” + LeafName + ” too long.\nChanged folder name to: TOO LONG”);
DirName = “TOO LONG”;
}

if (!Directory.Exists(DirName))
{
Directory.CreateDirectory(DirName);
//Console.WriteLine(“Tworzę folder: ” + DirName);
}

Reply
Rajnikant Sharma Says:

June 15, 2010 at 6:54 am
Hi Friend,

The link provided to download the project is not working:

http://go2.wordpress.com/?id=725X1342&site=plaxoed.wordpress.com&url=http%3A%2F%2Fwww.plaxoed.com%2FDoxoEater.zip&sref=http%3A%2F%2Fplaxoed.wordpress.com%2F2006%2F11%2F02%2Flittle-tool-to-extract-all-files-out-of-sharepoint%2F

it gives 404 error

Reply
Hb Says:

August 23, 2010 at 8:28 am
It works for me, BUT when i open an exported file, for example an .Xls-file; the spreadsheet is filled with ASCII.

The same goes for pdf, .doc(x), .ppt(x) and all other files.

Missing file-header or what??

Reply
tina Says:

September 5, 2010 at 9:06 am
Hi!

I am just collecting information about sharepoint service – I think now they are called Sharepoint 2010 foundation?

Can I use this script for this version, too?

I am a bit nervous about loosing my data und would use the software only if I had such a script

Btw. does anyone of you use Sharepoint with linux or mac, too? I’m not shure if this works allthough one can use Webdav.

Reply
Diego Says:

January 18, 2011 at 3:22 pm
Hi! Thanks for hints and code! I recovered by a server down just editing your code to attach directly to .MDF File.

THANKS!
Diego

Reply