Little tool to extract all files out of Sharepoint

November 2, 2006

We’ve been using Microsoft Sharepoint here at Plaxo to organize our projects and files. For those who have used Sharepoint, you know it’s a love/hate relationship 🙂

Anyways, we needed to get a copy of all the files in our Sharepoint site recently. Looking around on the web, we found a few solutions, but couldn’t really find one that did exactly what we wanted. So, I wrote up a quick little app to grab all the files we needed out of Sharepoint, while preserving Sharepoint’s directory structure. It’s a command line tool and it doesn’t have any options.

It’s really not that hard; here’s the source in case you’re interested:

using System;
using System.Collections.Generic;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.IO;
 
// replace this string with your Sharepoint content DB connection string
string DBConnString = “Server=YOURSHAREPOINTSERVER;Database=CONTENTDATABASE;Trusted_Connection=True;”;
 
// create a DB connection
SqlConnection con = new SqlConnection(DBConnString);
con.Open();
 
// the query to grab all the files.
// Note: Feel free to alter the LeafName like ‘%.extension’ arguments to suit your purpose
SqlCommand com = con.CreateCommand();
com.CommandText = “select DirName, LeafName, Content from Docs where (LeafName like ‘%.doc’ or LeafName like ‘%.xls’ or LeafName like ‘%.pdf’ or LeafName like ‘%.ppt’) and Content is not NULL”;
 
// execute query
SqlDataReader reader = com.ExecuteReader();
 
while (reader.Read())
{
    // grab the file’s directory and name
    string DirName = (string)reader[“DirName”];
    string LeafName = (string)reader[“LeafName”];
 
    // create directory for the file if it doesn’t yet exist
    if (!Directory.Exists(DirName))
    {
        Directory.CreateDirectory(DirName);
        Console.WriteLine(“Creating directory: “ + DirName);
    }
 
    // create a filestream to spit out the file
    FileStream fs = new FileStream(DirName + “/” + LeafName, FileMode.Create, FileAccess.Write);
    BinaryWriter writer = new BinaryWriter(fs);
 
    // depending on the speed of your network, you may want to change the buffer size (it’s in bytes)
    int bufferSize = 1000000;
    long startIndex = 0;
    long retval = 0;
    byte[] outByte = new byte[bufferSize];
 
    // grab the file out of the db one chunk (of size bufferSize) at a time
    do
    {
        retval = reader.GetBytes(2, startIndex, outByte, 0, bufferSize);
        startIndex += bufferSize;
 
        writer.Write(outByte, 0, (int)retval);
        writer.Flush();
    } while (retval == bufferSize);
 
    // finish writing the file
    writer.Close();
    fs.Close();
 
    Console.WriteLine(“Finished writing file: “ + LeafName);
}
 
// close the DB connection and whatnots
reader.Close();
con.Close();

Here’s my VS.NET 2005 project file. If you need help with the code, please let me know.

45 Responses to “Little tool to extract all files out of Sharepoint”

  1. rose Says:

    hmm, i think you could have just used sharepoint in the folder view, then dragged everything onto your desktop 🙂

    ah, how i wished we used sharepoint here at work…


  2. out of curiosity, why did you need to do this?

  3. Chupes Says:

    hello. pardon my newbieness here, but is this a sql script?o what other language?

  4. markjen Says:

    Hi Chupes,

    Yup, this is a simple SQL query that extracts all the files in sharepoint and writes them to the file system.

    It’s written in C#, but it’s .NET, so you could easily convert it to VB.NET, C++.NET, etc.

  5. m Says:

    you rock

  6. Steven Borg Says:

    Nice work!!! Yes, the theory is that you can just drag and drop from a folder view, however, that sucks when you’re trying to automate the retrieval! Or, like in our case, where the entire WSS 2.0 site got completely hosed during the upgrade to WSS 3.0 and we could see all the files, but not download them. This script was a lifesaver!

  7. Eric Says:

    Mark, When I parse or execute this script in the Query Analyzer I get the following error, and I cannot pinpoint the problem. Any suggestions? Because I too need to pull documents from a wss2 database that is not in use anymore.

    [Microsoft][ODBC SQL Server Driver]Syntax error or access violation

  8. Eric Says:

    Ignore the message above. I now have the code and your C# project file loaded in vs.net. Running the debug led me to find an error that states the Content column dosen’t exist. This database is the Moss 07 beta, so I am not sure if that is the issue. Since the code just checks to see if it is null, I took it out.

    Now I think I am so close becasue it will actually create a directory structure, but errors on pulling the file over in “chunks”. The debug says the “Index was outside the bounds of the array” and it is point to the outByte portion of this statement:

    retval = reader.GetBytes(2, startIndex, outByte, 0, bufferSize)

    I’m still looking for the answer myself and I am hoping I can get this to work. Any help would be great, thanks.

  9. Eric Says:

    I have modified the code to work with SharePoint 07. Since the Content is stored in a new table with 07, the query piece needs to be a little different. Just replace line 24 with the sting below. It worked perfectly for me after that.

    Find
    com.CommandText = “select DirName, LeafName, Content from Docs where (LeafName like ‘%.doc’ or LeafName like ‘%.xls’ or LeafName like ‘%.pdf’ or LeafName like ‘%.ppt’) and Content is not NULL”;

    Replace With
    com.CommandText = "select [AllDocs].[DirName], [AllDocs].[LeafName], [AllDocStreams].[Content] from [AllDocs],[AllDocStreams] where (LeafName like '%.doc' or LeafName like '%.xls' or LeafName like '%.pdf' or LeafName like '%.ppt') and [AllDocStreams].[Content] is not NULL and [AllDocs].[Id] = [AllDocStreams].[Id]";

  10. mINDAUGAS Says:

    Hi, is there any way to use your tool for default Windows Internal Database msde i’m trying to connect to database seting Serter=fs\microsoft##ssee with no sucess. fs -is the same host there installated sharepoint service and internal database .
    thanks.

  11. bud Says:

    you da man!sharepoint config db corrupted but was able to recover our files with your script!thank you! thank you!

  12. JeffD Says:

    Thanks!!! The WSS 2>3 converter left my database (with over 2 gigs of files) hosed. Your script got me back an intact file system full of my files. Don’t know how to say thanks enough.

  13. wbrproductions Says:

    markjen and #9 are the shiznizkydoodle! Thanks, folks, you just saved my butt. That worked like a goddam lucky charm. 2 weeks of frustration dissolved by a 15 second script.

  14. Alex Guzun Says:

    My MOSS2007 just gone to hell… got you app. Fantastic job…
    Is it possible to get lists? We had a big amount of data in lists 😦
    To get wiki, i think it is enough to add *.html

    Thank you very much in advance.

  15. Alex Guzun Says:

    Another problem 😦 It can’t created looong directories. Program stops when tried to create dir witch full path contain 194 symbols 😦 Is it possible to made something (at least) to pass that dir, not to fail?

  16. Alex Guzun Says:

    Any solution to skip ‘long directory’?

    Unhandled Exception: System.IO.PathTooLongException: The specified path, file name, or both are too long. The fully qual
    ified file name must be less than 260 characters, and the directory name must be less than 248 characters.
    at System.IO.Path.NormalizePathFast(String path, Boolean fullCheck)
    at System.IO.Path.GetFullPathInternal(String path)
    at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShar
    e share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy)
    at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
    at DoxoEater.Program.Main(String[] args) in c:\!!!\DoxoEater\DoxoEater\Program.cs:line 44

  17. markjen Says:

    Hi Alex, you could run a quick check on the path size before line 44 in program.cs. Not sure what you’d do with the files that were supposed to be in the long-name directory though. Perhaps you could write a small function that would shorten the directory name?

  18. Happy Says:

    Thanks a million. You program saved our documents.
    I have used several days to try to get Moss to do this. Now I can delete all and make a new installment.

    thanks
    🙂

  19. Ryan Says:

    How can I get the latest version of this code? I have been reading through this and there seems like some bugs have been fixed.

    What would make this perfect for me is if there was logging that confirms each file did or did not get written to a drive.

    Does anyone know of a program similar to this that I can purchase?

  20. ryan Says:

    It’s me again. I have been using this script and it has worked very well.

    Does anyone know if there is a simular script to extract List data?

  21. joe Says:

    Is there any code out there to unload/extract calendar data in a script for some automation purposes I have?

  22. Wordpress Says:

    developmentchinese jacketcompetition Chinesedressdevelopment capitalsilk clothesStreet grilchinese dresscultural womancheongsamhistory ÓÍÕôÆøÕæ¿Õ±ÃÓ°Ïì 0.8833553045874069

  23. Dwain Says:

    I thought I’d pile on and say THANK YOU – this is a great little utility. I added an args parameter so I could get the documents out of any specific site, but your code was a great start. Thanks!

  24. abhijeet Says:

    Getting errors on pulling the file over in “chunks”. The debug says the “Index was outside the bounds of the array” and it is point to the outByte portion of this statement:

    retval = reader.GetBytes(2, startIndex, outByte, 0, bufferSize)

    earlier eric posted the same problem but here i haven’t see any comment about that, i f any one have idea about this please help me..

    I’m still looking for the answer myself and I am hoping I can get this to work. Any help would be great, thanks.

  25. Paul Says:

    This is a great script, but you will notice that this does not maintain version information. In order to do so you need the following commands

    select [DirName], [LeafName], [Content], [UIVersion] as Version from [Docs] where [Content] is not NULL

    select [AllDocs].[DirName], [AllDocs].[LeafName], [AllDocStreams].[Content],[AllDocs].[UIVersionString] as Version from [AllDocs],[AllDocStreams] where [AllDocStreams].[Content] is not NULL and [AllDocs].[Id] = [AllDocStreams].[Id]

    Add the following to the reader.Read loop

    string Version = (string)reader[“Version”].ToString();

    //Add this below the reader assignment of leafName
    string leafName= “[” + Version.ToString() + “]” + LeafName;

    I think that this will address all scenarios where a file is over written. As a precaution I have also added some logging in my script to check and see if the exists before I extract it. I recommend that you do the same.

  26. Wes Says:

    Thanks, works very well save one issue for me. I can dump doc, xls, etc but we had a folder of images (.jpg) that when I attempt to dump throws “Path cannot be the empty string or all whitespace” at Directory.CreateDirectory(DirName). I’m a complete code newb, is there anyway to add a modifier to the directory name so that it cannot be empty or something like that? Thanks!


  27. […] Filed under: SharePoint — Keith Richie @ 11:03 pm My good friend Todd Klindt pointed me to this posting written by Mark Jen which showed some sample code to export site content from a SharePoint Content […]


  28. Hi there!

    Obviously, I am missing somehting here, I cant get the script to run, I tried putting it in SQL query editor, get a WACK of errors with syntax, so that cant be it, I tried compliling with CSC, again no go, so what is it that I am missing?

    Thanks in advance!


  29. I got it to run!! Since I wasnt to up on my C# application building, the last link (from Keith) have the project structure, and with some copy pasta, I was able to complile! (“‘s needed to be converted to “‘s )

    Thanks, you saved our “bacon” !!!

  30. frank Says:

    Thanks for the information, i,m new here looking for more info on share poit and shere file! Thanks again!

  31. roger Says:

    Excellent !

    Excactly what we needed after our sharepointserver tanked…. 🙂

    thanks a bunch

  32. Nathan B Says:

    I’m a bit loathe to admit I had to use this utility, but sometimes disasters happen.

    Onto my answer:

    People were having issues with the “Index was Outside the bounds of the array.” of the GetBytes command. Googling answers for this told me that the GetBytes command gets the bytes of a value in the array.

    Since I used PLAXOED! and KRichie’s blog, I had an amalgamation of the two authors’ code.
    It turns out the GetBytes command’s first parameter corresponds to which index of an array to get the bytes from. In this case, the array is a row of our SQL query at the beginning–
    Because I had ‘4’ from Krichie’s code, I was trying to get the value of non-existent column in the array from the PLaxoed! code.

    Changing this number to 2 (the index corresponding to the Content column) fixed it for me.

    Great code, thank goodness!

  33. Dave Says:

    This was quite useful.
    I used this SQL instead to get all files.

    “SELECT DirName, LeafName, Content FROM Docs WHERE Type = 0 and Content is not NULL”;

  34. brian Hman Says:

    You know I don’t usually leave comments on code that I find online, but this is VERY VERY useful. Thank you so much for your post!

    Brian Hman

  35. Robert A Says:

    Hello! I’m going to ask a stupid question, how do I run this script?? Do I need Visual Studio and compile it and run it as an executable? Could someone please explain this step by step. Really need to do this on my damaged Sharepoint site. Thanks in advance!

  36. Murat Sözen Says:

    Many thanks for the guidance and clear instructions.

    When needed, this is a life saver. Your article wll deserves all the appreciation. Wishing you the best, thanks again.

  37. Edwill Says:

    Thank you, I found myself in the perfect situation for this…

  38. Abhijeet Khake Says:

    Hi,
    How can write sql query to get documents for single user,
    like select * from docs where user=’xyz’

    how can i join users to docs table and retrieve documents user by user.

    Thanks
    Abhijeet Khake

  39. Mateusz Says:

    It worked for me with WSS 3.0 and SharePoint server 2007 (after changing the SQL command of course).

    Thanks a lot!


  40. thanks for the codesnippet. It help me to extract all the documents, on a wss where AD integration was lost.
    On wss3 you should use the sql statement

    select d.dirName, d.LeafName, s.content from alldocs d join alldocstreams s on d.id = s.id where Content is not NULL

    instead.

    Thanks
    Jakob

  41. Mateusz Says:

    I just had PathTooLongException. I handled it with this code. Maybe someone my need it too:

    if (DirName.Length > 247)
    {
    Console.WriteLine(“Foler name: ” + DirName + ” is too long\nChanged folder name to: TOO LONG”);
    DirName = “TOO LONG”;
    }

    if (DirName.Length + LeafName.Length > 258)
    {
    Console.WriteLine(“Path name: ” + DirName + “/” + LeafName + ” too long.\nChanged folder name to: TOO LONG”);
    DirName = “TOO LONG”;
    }

    if (!Directory.Exists(DirName))
    {
    Directory.CreateDirectory(DirName);
    //Console.WriteLine(“Tworzę folder: ” + DirName);
    }

  42. Hb Says:

    It works for me, BUT when i open an exported file, for example an .Xls-file; the spreadsheet is filled with ASCII.

    The same goes for pdf, .doc(x), .ppt(x) and all other files.

    Missing file-header or what??

  43. tina Says:

    Hi!

    I am just collecting information about sharepoint service – I think now they are called Sharepoint 2010 foundation?

    Can I use this script for this version, too?

    I am a bit nervous about loosing my data und would use the software only if I had such a script

    Btw. does anyone of you use Sharepoint with linux or mac, too? I’m not shure if this works allthough one can use Webdav.

  44. Diego Says:

    Hi! Thanks for hints and code! I recovered by a server down just editing your code to attach directly to .MDF File.

    THANKS!
    Diego


Leave a comment